Sciweavers

DL
2000
Springer

Snowball: extracting relations from large plain-text collections

14 years 4 months ago
Snowball: extracting relations from large plain-text collections
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection. We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents. At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention, and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experim...
Eugene Agichtein, Luis Gravano
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 2000
Where DL
Authors Eugene Agichtein, Luis Gravano
Comments (0)