Sciweavers

WWW
2006
ACM

Optimizing scoring functions and indexes for proximity search in type-annotated corpora

15 years 5 days ago
Optimizing scoring functions and indexes for proximity search in type-annotated corpora
We introduce a new, powerful class of text proximity queries: find an instance of a given "answer type" (person, place, distance) near "selector" tokens matching given literals or satisfying given ground predicates. An example query is type=distance NEAR Hamburg Munich. Nearness is defined as a flexible, trainable parameterized aggregation function of the selectors, their frequency in the corpus, and their distance from the candidate answer. Such queries provide a key data reduction step for information extraction, data integration, question answering, and other text-processing applications. We describe the architecture of a next-generation information retrieval engine for such applications, and investigate two key technical problems faced in building it. First, we propose a new algorithm that estimates a scoring function from past logs of queries and answer spans. Plugging the scoring function into the query processor gives high accuracy: typically, an answer is f...
Soumen Chakrabarti, Kriti Puniyani, Sujatha Das
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2006
Where WWW
Authors Soumen Chakrabarti, Kriti Puniyani, Sujatha Das
Comments (0)