Comparing Dimension Reduction Techniques for Document Clustering

14 years 6 months ago

Download users.cs.dal.ca

In this research, a systematic study is conducted of four dimension reduction techniques for the text clustering problem, using five benchmark data sets. Of the four methods -- Independent Component Analysis (ICA), Latent Semantic Indexing (LSI), Document Frequency (DF) and Random Projection (RP) -- ICA and LSI are clearly superior when the k-means clustering algorithm is applied, irrespective of the data sets. Random projection consistently returns the worst results, where this appears to be due to the noise distribution characterizing the document clustering task.

Bin Tang, Michael A. Shepherd, Malcolm I. Heywood,

Real-time Traffic

AI 2005 | Artificial Intelligence | Data Sets | Random Projection | Text Clustering Problem |

claim paper

Post Info
More Details (n/a)

Added	26 Jun 2010
Updated	26 Jun 2010
Type	Conference
Year	2005
Where	AI
Authors	Bin Tang, Michael A. Shepherd, Malcolm I. Heywood, Xiao Luo

Comments (0)

Sciweavers

Comparing Dimension Reduction Techniques for Document Clustering

AI 2005 | Artificial Intelligence | Data Sets | Random Projection | Text Clustering Problem |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers