Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison

15 years 8 months ago

Download turing.cs.washington.edu

Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an autonomous, domain-independent, and scalable manner. In its first major run, KNOWITALL extracted over 50,000 facts with high precision, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Rule Learning learns domain-specific extraction rules. Subclass Extraction automatically identifies sub-classes in order to boost recall. List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, no hand-labeled training examples are required. Experiments show the relative coverage of each method and demonstrate their synergy. In...

Oren Etzioni, Michael J. Cafarella, Doug Downey, A

Real-time Traffic

AAAI 2004 | Domain-specific Extraction Rules | Extraction Locates Lists | Intelligent Agents | Subclass Extraction |

claim paper

» Automatic Construction of a Semantic DomainIndependent Knowledge Base

» Extracting Instances of Relations from Web Documents Using Redundancy

» WebSets extracting sets of entities from the web using unsupervised information extraction

» A RedundancyBased Method for Relation Instantiation from the Web

» Mining templates from search result records of search engines

» Unsupervised namedentity extraction from the Web An experimental study

» Extracting data records from the web using tag path clustering

» A Method for Automatically Generating a Mediatory Summary to Verify Credibility of Informa...

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2004
Where	AAAI
Authors	Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates

Comments (0)

Sciweavers

Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison

AAAI 2004 | Domain-specific Extraction Rules | Extraction Locates Lists | Intelligent Agents | Subclass Extraction |

Explore & Download

Productivity Tools

Sciweavers