Enhancing text clustering by leveraging Wikipedia semantics

14 years 2 months ago

Download www.cse.ust.hk

Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the important information on the semantic relationships between key terms. To overcome this problem, several methods have been proposed to enrich text representation with external resource in the past, such as WordNet. However, many of these approaches suffer from some limitations: 1) WordNet has limited coverage and has a lack of effective word-sense disambiguation ability; 2) Most of the text representation enrichment strategies, which append or replace document terms with their hypernym and synonym, are overly simple. In this paper, to overcome these deficiencies, we first propose a way to build a concept thesaurus based on the semantic relations (synonym, hypernym, and associative relation) extracted from Wikipedia. Then, we develop a unified framework to leverage these semantic relations in order to enhance ...

Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua L

Real-time Traffic

Information Technology | SIGIR 2008 | Text Clustering | Text Clustering Methods | Text Representation |

claim paper

Post Info
More Details (n/a)

Added	15 Dec 2010
Updated	15 Dec 2010
Type	Journal
Year	2008
Where	SIGIR
Authors	Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang, Zheng Chen

Comments (0)

Sciweavers

Enhancing text clustering by leveraging Wikipedia semantics

Information Technology | SIGIR 2008 | Text Clustering | Text Clustering Methods | Text Representation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers