Sciweavers

ICDE
2009
IEEE

Join Optimization of Information Extraction Output: Quality Matters!

14 years 6 months ago
Join Optimization of Information Extraction Output: Quality Matters!
— Information extraction (IE) systems are trained to extract specific relations from text databases. Real-world applications often require that the output of multiple IE systems be joined to produce the data of interest. To optimize the execution of a join of multiple extracted relations, it is not sufficient to consider only execution time. In fact, the quality of the join output is of critical importance: unlike in the relational world, different join execution plans can produce join results of widely different quality whenever IE systems are involved. In this paper, we develop a principled approach to understand, estimate, and incorporate output quality into the join optimization process over extracted relations. We argue that the output quality is affected by (a) the configuration of the IE systems used to process documents, (b) the document retrieval strategies used to retrieve documents, and (c) the actual join algorithm used. Our analysis considers several alternatives for ...
Alpa Jain, Panagiotis G. Ipeirotis, AnHai Doan, Lu
Added 19 May 2010
Updated 19 May 2010
Type Conference
Year 2009
Where ICDE
Authors Alpa Jain, Panagiotis G. Ipeirotis, AnHai Doan, Luis Gravano
Comments (0)