Resource-Bounded Information Extraction: Acquiring Missing Feature Values on Demand

15 years 10 months ago

Download www.cs.umass.edu

We present a general framework for the task of extracting speciﬁc information “on demand” from a large corpus such as the Web under resource-constraints. Given a database with missing or uncertain information, the proposed system automatically formulates queries, issues them to a search interface, selects a subset of the documents, extracts the required information from them, and ﬁlls the missing values in the original database. We also exploit inherent dependency within the data to obtain useful information with fewer computational resources. We build such a system in the citation database domain that extracts the missing publication years using limited resources from the Web. We discuss a probabilistic approach for this task and present ﬁrst results. The main contribution of this paper is to propose a general, comprehensive architecture for designing a system adaptable to diﬀerent domains.

Pallika Kanani, Andrew McCallum, Shaohan Hu

Real-time Traffic

Citation Database Domain | Data Mining | Fewer Computational Resources | Missing Values | PAKDD 2010 |

claim paper

Post Info
More Details (n/a)

Added	30 Aug 2010
Updated	30 Aug 2010
Type	Conference
Year	2010
Where	PAKDD
Authors	Pallika Kanani, Andrew McCallum, Shaohan Hu

Comments (0)

Sciweavers

Resource-Bounded Information Extraction: Acquiring Missing Feature Values on Demand

Citation Database Domain | Data Mining | Fewer Computational Resources | Missing Values | PAKDD 2010 |

Explore & Download

Productivity Tools

Sciweavers