Clustering Documents Using a Wikipedia-Based Concept Representation

16 years 2 months ago

Download www.cs.waikato.ac.nz

Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We ﬁrst create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques.

Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit

Real-time Traffic

Concept-based Document Representation | Data Mining | Document | PAKDD 2009 | Similarity Measure |

claim paper

» WikipediaBased Kernels for Text Categorization

» Locally Consistent Concept Factorization for Document Clustering

» Concept Chain Based Text Clustering

» Clustering Documents with Active Learning Using Wikipedia

» Exploiting Wikipedia as external knowledge for document clustering

» Narrowing the semantic gap improved textbased web document retrieval using visual feature...

» Ontologybased Text Document Clustering

» Automatic Generation of Informationseeking Questions Using Concept Clusters

Post Info
More Details (n/a)

Added	20 May 2010
Updated	20 May 2010
Type	Conference
Year	2009
Where	PAKDD
Authors	Anna Huang, David N. Milne, Eibe Frank, Ian H. Witten

Comments (0)

Sciweavers

Clustering Documents Using a Wikipedia-Based Concept Representation

Concept-based Document Representation | Data Mining | Document | PAKDD 2009 | Similarity Measure |

Explore & Download

Productivity Tools

Sciweavers