Navigating Extracted Data with Schema Discovery

15 years 8 months ago

Download turing.cs.washington.edu

Open Information Extraction (OIE) is a recently-introduced type of information extraction that extracts small individual pieces of data from input text without any domainspeciﬁc guidance such as special training data or extraction rules. For example, an OIE system might discover the triple Frenzy, year, 1972 from a set of documents about movies. Because OIE is domain-independent, it promises to help users when they have a corpus of structured data, but that structure is unknown, such as when browsing a novel domain or formulating a query. We can describe the structure to the user by displaying a relational schema that ﬁts the extracted data. Unfortunately, the extractions do not carry full schema information: we have extracted values, but not the correct relations, their rows, or their columns. In response we propose TGen, an algorithm for schema discovery, which automatically derives a high-quality relational schema for the extracted data. Diﬀerent applications have diﬀerent ...

Michael J. Cafarella, Dan Suciu, Oren Etzioni

Real-time Traffic

Extracted Data | Information Extraction | Internet Technology | Relational Schema | WEBDB 2007 |

claim paper

Post Info
More Details (n/a)

Added	09 Jun 2010
Updated	09 Jun 2010
Type	Conference
Year	2007
Where	WEBDB
Authors	Michael J. Cafarella, Dan Suciu, Oren Etzioni

Comments (0)

Sciweavers

Navigating Extracted Data with Schema Discovery

Extracted Data | Information Extraction | Internet Technology | Relational Schema | WEBDB 2007 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers