Sciweavers

WEBDB
2007
Springer

Navigating Extracted Data with Schema Discovery

14 years 5 months ago
Navigating Extracted Data with Schema Discovery
Open Information Extraction (OIE) is a recently-introduced type of information extraction that extracts small individual pieces of data from input text without any domainspecific guidance such as special training data or extraction rules. For example, an OIE system might discover the triple Frenzy, year, 1972 from a set of documents about movies. Because OIE is domain-independent, it promises to help users when they have a corpus of structured data, but that structure is unknown, such as when browsing a novel domain or formulating a query. We can describe the structure to the user by displaying a relational schema that fits the extracted data. Unfortunately, the extractions do not carry full schema information: we have extracted values, but not the correct relations, their rows, or their columns. In response we propose TGen, an algorithm for schema discovery, which automatically derives a high-quality relational schema for the extracted data. Different applications have different ...
Michael J. Cafarella, Dan Suciu, Oren Etzioni
Added 09 Jun 2010
Updated 09 Jun 2010
Type Conference
Year 2007
Where WEBDB
Authors Michael J. Cafarella, Dan Suciu, Oren Etzioni
Comments (0)