Efficient Prediction-Based Validation for Document Clustering

15 years 10 months ago

Download www.scss.tcd.ie

Recently, stability-based techniques have emerged as a very promising solution to the problem of cluster validation. An inherent drawback of these approaches is the computational cost of generating and assessing multiple clusterings of the data. In this paper we present an efficient prediction-based validation approach suitable for application to large, high-dimensional datasets such as text corpora. We use kernel clustering to isolate the validation procedure from the original data. Furthermore, we employ a prototype reduction strategy that allows us to work on a reduced kernel matrix, leading to significant computational savings. To ensure that this condensed representation accurately reflects the cluster structures in the data, we propose a density-biased selection strategy. This novel validation process is evaluated on a large number of real and artificial datasets, where it is shown to consistently produce good estimates for the optimal number of clusters.

Derek Greene, Padraig Cunningham

Real-time Traffic

Cluster Validation | ECML 2006 | Efficient Prediction-based Validation | Machine Learning | Validation Procedure |

claim paper

Added	22 Aug 2010
Updated	22 Aug 2010
Type	Conference
Year	2006
Where	ECML
Authors	Derek Greene, Padraig Cunningham

Sciweavers

Efficient Prediction-Based Validation for Document Clustering

Cluster Validation | ECML 2006 | Efficient Prediction-based Validation | Machine Learning | Validation Procedure |

Explore & Download

Productivity Tools

Sciweavers