Sciweavers

PVLDB
2010

Automatic Rule Refinement for Information Extraction

13 years 6 months ago
Automatic Rule Refinement for Information Extraction
Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research
Bin Liu 0002, Laura Chiticariu, Vivian Chu, H. V.
Added 20 May 2011
Updated 20 May 2011
Type Journal
Year 2010
Where PVLDB
Authors Bin Liu 0002, Laura Chiticariu, Vivian Chu, H. V. Jagadish, Frederick Reiss
Comments (0)