Efficient Information Extraction over Evolving Text Data

16 years 8 months ago

Download pages.cs.wisc.edu

Abstract-- Most current information extraction (IE) approaches have considered only static text corpora, over which we typically have to apply IE only once. Many real-world text corpora however are dynamic. They evolve over time, and to keep extracted information up to date, we often must apply IE repeatedly, to consecutive corpus snapshots. We describe Cyclex, an approach that efficiently executes such repeated IE, by recycling previous IE efforts. Specifically, given a current corpus snapshot U, Cyclex identifies text portions of U that also appear in the previous corpus snapshot V . Since Cyclex has already executed IE over V , it can now recycle the IE results of these parts, by combining these results with the results of executing IE over the remaining parts of U, to produce the complete IE results for U. Realizing Cyclex raises many challenges, including modeling information extractors, exploring the trade-off between runtime and completeness in identifying overlapping text, and ...

Fei Chen 0002, AnHai Doan, Jun Yang 0001, Raghu Ra

Real-time Traffic

Corpus Snapshot U | Database | ICDE 2008 | Real-world Text Corpora | Static Text Corpora |

claim paper

» Optimizing Statistical Information Extraction Programs over Evolving Text

» A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions

» SQL Queries Over Unstructured Text Databases

» Building query optimizers for information extraction the SQoUT project

» Optimizing SQL Queries over Text Databases

» Declarative Information Extraction Using Datalog with Embedded Extraction Predicates

» Querydriven indexing for peertopeer text retrieval

» Extensive Evaluation of Efficient NLPDriven Text Classification

Post Info
More Details (n/a)

Added	01 Nov 2009
Updated	01 Nov 2009
Type	Conference
Year	2008
Where	ICDE
Authors	Fei Chen 0002, AnHai Doan, Jun Yang 0001, Raghu Ramakrishnan

Comments (0)

Sciweavers

Efficient Information Extraction over Evolving Text Data

Corpus Snapshot U | Database | ICDE 2008 | Real-world Text Corpora | Static Text Corpora |

Explore & Download

Productivity Tools

Sciweavers