Optimizing scoring functions and indexes for proximity search in type-annotated corpora

16 years 8 months ago

Download www.cse.iitb.ac.in

We introduce a new, powerful class of text proximity queries: find an instance of a given "answer type" (person, place, distance) near "selector" tokens matching given literals or satisfying given ground predicates. An example query is type=distance NEAR Hamburg Munich. Nearness is defined as a flexible, trainable parameterized aggregation function of the selectors, their frequency in the corpus, and their distance from the candidate answer. Such queries provide a key data reduction step for information extraction, data integration, question answering, and other text-processing applications. We describe the architecture of a next-generation information retrieval engine for such applications, and investigate two key technical problems faced in building it. First, we propose a new algorithm that estimates a scoring function from past logs of queries and answer spans. Plugging the scoring function into the query processor gives high accuracy: typically, an answer is f...

Soumen Chakrabarti, Kriti Puniyani, Sujatha Das

Real-time Traffic

Document Trec Corpus | Internet Technology | Text Proximity Queries | TREC Queries | WWW 2006 |

claim paper

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2006
Where	WWW
Authors	Soumen Chakrabarti, Kriti Puniyani, Sujatha Das

Sciweavers

Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Document Trec Corpus | Internet Technology | Text Proximity Queries | TREC Queries | WWW 2006 |

Explore & Download

Productivity Tools

Sciweavers