Sciweavers

EMNLP
2006

Graph-based Word Clustering using a Web Search Engine

14 years 1 months ago
Graph-based Word Clustering using a Web Search Engine
Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts. Each pair of words is queried to a search engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word cooccurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a web directory and WordNet.
Yutaka Matsuo, Takeshi Sakaki, Koki Uchiyama, Mits
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where EMNLP
Authors Yutaka Matsuo, Takeshi Sakaki, Koki Uchiyama, Mitsuru Ishizuka
Comments (0)