Sciweavers

CIKM
2009
Springer

Helping editors choose better seed sets for entity set expansion

14 years 6 months ago
Helping editors choose better seed sets for entity set expansion
Sets of named entities are used heavily at commercial search engines such as Google, Yahoo and Bing. Acquiring sets of entities typically consists of combining semi-supervised expansion algorithms with manual cleaning of the resulting expanded sets. In this paper, we study the effects of different seed sets in a state-of-the-art semi-supervised expansion system and show a tremendous variation in expansion performance depending on the choice of seeds. We further show that human editors, in general, provide very bad seed sets, which perform well-below the average random seed set. We identify three factors of seed set composition, namely prototypicality, ambiguity and coverage, and we investigate their effects on expansion performance. Finally, we propose various automatic systems for improving editor-generated seed sets, which seek to remove ambiguous and other error-prone seed instances. An extensive experimental analysis shows that expansion quality, measured in R-precision, can be...
Vishnu Vyas, Patrick Pantel, Eric Crestan
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where CIKM
Authors Vishnu Vyas, Patrick Pantel, Eric Crestan
Comments (0)