Helping editors choose better seed sets for entity set expansion

14 years 7 months ago

Download www.patrickpantel.com

Sets of named entities are used heavily at commercial search engines such as Google, Yahoo and Bing. Acquiring sets of entities typically consists of combining semi-supervised expansion algorithms with manual cleaning of the resulting expanded sets. In this paper, we study the eﬀects of diﬀerent seed sets in a state-of-the-art semi-supervised expansion system and show a tremendous variation in expansion performance depending on the choice of seeds. We further show that human editors, in general, provide very bad seed sets, which perform well-below the average random seed set. We identify three factors of seed set composition, namely prototypicality, ambiguity and coverage, and we investigate their eﬀects on expansion performance. Finally, we propose various automatic systems for improving editor-generated seed sets, which seek to remove ambiguous and other error-prone seed instances. An extensive experimental analysis shows that expansion quality, measured in R-precision, can be...

Vishnu Vyas, Patrick Pantel, Eric Crestan

Real-time Traffic

CIKM 2009 | Database | Expansion Performance | Seed Set | Semi-supervised Expansion |

claim paper

Post Info
More Details (n/a)

Added	26 May 2010
Updated	26 May 2010
Type	Conference
Year	2009
Where	CIKM
Authors	Vishnu Vyas, Patrick Pantel, Eric Crestan

Comments (0)

Sciweavers

Helping editors choose better seed sets for entity set expansion

CIKM 2009 | Database | Expansion Performance | Seed Set | Semi-supervised Expansion |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers