

Resolving Surface Forms to Wikipedia Topics

13 years 7 months ago
Resolving Surface Forms to Wikipedia Topics
Ambiguity of entity mentions and concept references is a challenge to mining text beyond surface-level keywords. We describe an effective method of disambiguating surface forms and resolving them to Wikipedia entities and concepts. Our method employs an extensive set of features mined from Wikipedia and other large data sources, and combines the features using a machine learning approach with automatically generated training data. Based on a manually labeled evaluation set containing over 1000 news articles, our resolution model has 85% precision and 87.8% recall. The performance is significantly better than three baselines based on traditional context similarities or sense commonness measurements. Our method can be applied to other languages and scales well to new entities and concepts.
Yiping Zhou, Lan Nie, Omid Rouhani-Kalleh, Flavian
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Authors Yiping Zhou, Lan Nie, Omid Rouhani-Kalleh, Flavian Vasile, Scott Gaffney
Comments (0)