Scalable ad-hoc entity extraction from text collections

15 years 6 months ago

Download research.microsoft.com

Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc" entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task. In such scenarios, traditional entity extraction techniques that process all the documents for each ad-hoc entity extraction task can be significantly expensive. In this paper, we propose an efficient approach that leverages the inverted index on the documents to identify the subset of documents relevant to the task and processes only those documents. We demonstrate the efficiency of our techniques on real datasets.

Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaud

Real-time Traffic

Documents | Entity Extraction | Entity Extraction Task | PVLDB 2008 |

claim paper

» Toward Completeness in Concept Extraction and Classification

» Topic sentiment mixture modeling facets and opinions in weblogs

» Entity categorization over large document collections

» Multidimensional Visualization and Navigation in Search Results

» Extraction of semantic biomedical relations from text using conditional random fields

» A system for finding biological entities that satisfy certain conditions from texts

» Unique Renaming of Java Using Source Transformation

» Extracting Useful Information from the Full Text of Fiction

Post Info
More Details (n/a)

Added	28 Dec 2010
Updated	28 Dec 2010
Type	Journal
Year	2008
Where	PVLDB
Authors	Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti

Comments (0)

Sciweavers

Scalable ad-hoc entity extraction from text collections

Documents | Entity Extraction | Entity Extraction Task | PVLDB 2008 |

Explore & Download

Productivity Tools

Sciweavers