Kernel PCA based clustering for inducing features in text categorization

14 years 1 months ago

Download www.dice.ucl.ac.be

We study dimensionality reduction or feature selection in text document categorization problem. We focus on the ﬁrst step in building text categorization systems, that is the choice of eﬃciently representing numerically the natural language text. This numerical representation is going to be used by machine learning algorithms. We propose a representation based on word clusters. We build a kernel matrix from the word distribution over the diﬀerent categories and apply kernel PCA to extract a low-dimensional representation of words. On this low-dimensional representation we use K-means clustering to group words into clusters and use these clusters subsequently in the document categorization task. We show that kernel PCA based clustering gives better or comparable performance than several advanced clustering methods when applied for the standard Reuters corpus.

Zsolt Minier, Lehel Csató

Real-time Traffic

Document Categorization | ESANN 2007 | Kernel PCA | Low-dimensional Representation | Neural Networks |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ESANN
Authors	Zsolt Minier, Lehel Csató

Comments (0)

Sciweavers

Kernel PCA based clustering for inducing features in text categorization

Document Categorization | ESANN 2007 | Kernel PCA | Low-dimensional Representation | Neural Networks |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers