Sciweavers

PKDD
2007
Springer

Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction

14 years 6 months ago
Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction
Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a problem when trying to extract information from hardly redundant sources like corporate intranets, encyclopedic works or scientific databases. We present results on applying a weakly supervised pattern induction algorithm to Wikipedia to extract instances of arbitrary relations. In particular, we apply different configurations of a basic algorithm for pattern induction on seven different datasets. We show that the lack of redundancy leads to the need of a large amount of training data but that integrating Web extraction into the process leads to a significant reduction of required training data while maintaining the accuracy of Wikipedia. In particular we show that, though the use of the Web can hav...
Sebastian Blohm, Philipp Cimiano
Added 09 Jun 2010
Updated 09 Jun 2010
Type Conference
Year 2007
Where PKDD
Authors Sebastian Blohm, Philipp Cimiano
Comments (0)