Sciweavers

ACL
2006

Novel Association Measures Using Web Search with Double Checking

14 years 1 months ago
Novel Association Measures Using Web Search with Double Checking
A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as CoOccurrence Double Check (CODC), are presented. In the experiments on Rubenstein-Goodenough's benchmark data set, the CODC measure achieves correlation coefficient 0.8492, which competes with the performance (0.8914) of the model using WordNet. The experiments on link detection of named entities using the strategies of direct association, association matrix and scalar association matrix verify that the double-check frequencies are reliable. Further study on named entity clustering shows that the five measures are quite useful. In particular, CODC measure is very stable on wordword and name-name experiments. The application of CODC measure to expand community chains for personal name disambiguation achieves 9.65% and 14.22% increase compared to the system without community expansion. All the e...
Hsin-Hsi Chen, Ming-Shun Lin, Yu-Chuan Wei
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where ACL
Authors Hsin-Hsi Chen, Ming-Shun Lin, Yu-Chuan Wei
Comments (0)