W-kmeans: Clustering News Articles Using WordNet

14 years 2 months ago

Download ru6.cti.gr

 Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed suffering however from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. We are proposing the enhancement of standard kmeans algorithm using the external knowledge from WordNet hypernyms in a twofold manner: enriching the “bag of words” used prior to the clustering process and assisting the label generation procedure following it. Our experimentation revealed a significant improvement over standard kmeans for a corpus of news articles derived from major news portals. Moreover, the cluster labeling process generates useful and of high quality cluster tags.

Christos Bouras, Vassilis Tsogkas

Real-time Traffic

Information Technology | KES 2010 | Manageable Information Kernels | Quality Cluster Tags | Standard Kmeans |

claim paper

Post Info
More Details (n/a)

Added	29 Jan 2011
Updated	29 Jan 2011
Type	Journal
Year	2010
Where	KES
Authors	Christos Bouras, Vassilis Tsogkas

Comments (0)

Sciweavers

W-kmeans: Clustering News Articles Using WordNet

Information Technology | KES 2010 | Manageable Information Kernels | Quality Cluster Tags | Standard Kmeans |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers