Graph-based Word Clustering using a Web Search Engine

15 years 8 months ago

Download acl.ldc.upenn.edu

Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts. Each pair of words is queried to a search engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word cooccurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a web directory and WordNet.

Yutaka Matsuo, Takeshi Sakaki, Koki Uchiyama, Mits

Real-time Traffic

EMNLP 2006 | EMNLP 2007 | Word Clustering | Word Sense Disambiguation | Word Similarity Measure |

claim paper

» Clustering of Search Engine Keywords Using Access Logs

» Measuring the similarity between implicit semantic relations using web search engines

» IGroup a web image search engine with semantic clustering of search results

» Hierarchical clustering of WWW image search results using visual textual and link informat...

» Measuring semantic similarity between words using web search engines

» SemanticBased Grouping of Search Engine Results Using WordNet

» Using ContentBased and LinkBased Analysis in Building Vertical Search Engines

» Clustering ecommerce search engines based on their search interface pages using WISECluste...

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	EMNLP
Authors	Yutaka Matsuo, Takeshi Sakaki, Koki Uchiyama, Mitsuru Ishizuka

Comments (0)

Sciweavers

Graph-based Word Clustering using a Web Search Engine

EMNLP 2006 | EMNLP 2007 | Word Clustering | Word Sense Disambiguation | Word Similarity Measure |

Explore & Download

Productivity Tools

Sciweavers