Join Optimization of Information Extraction Output: Quality Matters!

16 years 1 months ago

Download www.cs.columbia.edu

— Information extraction (IE) systems are trained to extract speciﬁc relations from text databases. Real-world applications often require that the output of multiple IE systems be joined to produce the data of interest. To optimize the execution of a join of multiple extracted relations, it is not sufﬁcient to consider only execution time. In fact, the quality of the join output is of critical importance: unlike in the relational world, different join execution plans can produce join results of widely different quality whenever IE systems are involved. In this paper, we develop a principled approach to understand, estimate, and incorporate output quality into the join optimization process over extracted relations. We argue that the output quality is affected by (a) the conﬁguration of the IE systems used to process documents, (b) the document retrieval strategies used to retrieve documents, and (c) the actual join algorithm used. Our analysis considers several alternatives for ...

Alpa Jain, Panagiotis G. Ipeirotis, AnHai Doan, Lu

Real-time Traffic

Database | Execution Plans | Extracted Relations | ICDE 2009 | IE Systems |

claim paper

» Distributed Resource Allocation for Synchronous Fork and Join Processing Networks

» Towards the Classical Communication Complexity of Entanglement Distillation Protocols with...

» Detection and segmentation of sweeps in color graphics images

» Contextual advertising for web article printing

Post Info
More Details (n/a)

Added	19 May 2010
Updated	19 May 2010
Type	Conference
Year	2009
Where	ICDE
Authors	Alpa Jain, Panagiotis G. Ipeirotis, AnHai Doan, Luis Gravano

Comments (0)

Sciweavers

Join Optimization of Information Extraction Output: Quality Matters!

Database | Execution Plans | Extracted Relations | ICDE 2009 | IE Systems |

Explore & Download

Productivity Tools

Sciweavers