Sciweavers

ICDE
2008
IEEE

Efficient Information Extraction over Evolving Text Data

15 years 2 months ago
Efficient Information Extraction over Evolving Text Data
Abstract-- Most current information extraction (IE) approaches have considered only static text corpora, over which we typically have to apply IE only once. Many real-world text corpora however are dynamic. They evolve over time, and to keep extracted information up to date, we often must apply IE repeatedly, to consecutive corpus snapshots. We describe Cyclex, an approach that efficiently executes such repeated IE, by recycling previous IE efforts. Specifically, given a current corpus snapshot U, Cyclex identifies text portions of U that also appear in the previous corpus snapshot V . Since Cyclex has already executed IE over V , it can now recycle the IE results of these parts, by combining these results with the results of executing IE over the remaining parts of U, to produce the complete IE results for U. Realizing Cyclex raises many challenges, including modeling information extractors, exploring the trade-off between runtime and completeness in identifying overlapping text, and ...
Fei Chen 0002, AnHai Doan, Jun Yang 0001, Raghu Ra
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2008
Where ICDE
Authors Fei Chen 0002, AnHai Doan, Jun Yang 0001, Raghu Ramakrishnan
Comments (0)