Inverted indexes for phrases and strings

14 years 9 months ago

Download www.cs.nthu.edu.tw

Inverted indexes are the most fundamental and widely used data structures in information retrieval. For each unique word occurring in a document collection, the inverted index stores a list of the documents in which this word occurs. Compression techniques are often applied to further reduce the space requirement of these lists. However, the index has a shortcoming, in that only predeﬁned pattern queries can be supported eﬃciently. In terms of string documents where word boundaries are undeﬁned, if we have to index all the substrings of a given document, then the storage quickly becomes quadratic in the data size. Also, if we want to apply the same type of indexes for querying phrases or sequence of words, then the inverted index will end up storing redundant information. In this paper, we show the ﬁrst set of inverted indexes which work naturally for strings as well as phrase searching. The central idea is to exclude document d in the inverted list of a string P if every occu...

Manish Patil, Sharma V. Thankachan, Rahul Shah, Wi

Real-time Traffic

Information Technology | Inverted Index | Optimal Techniques | SIGIR 2011 | Word Boundaries |

claim paper

» Inverted Index based Modified Version of KNN for Text Categorization

» TinyLex static ngram index pruning with perfect recall

» Compressed Inverted Indexes for InMemory Search Engines

» DualSorted Inverted Lists

» Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files

» Answering approximate string queries on large data sets using external memory

» Effective and efficient objectbased image retrieval using visual phrases

» Compression Indexing and Retrieval for Massive String Data

Post Info
More Details (n/a)

Added	17 Sep 2011
Updated	17 Sep 2011
Type	Journal
Year	2011
Where	SIGIR
Authors	Manish Patil, Sharma V. Thankachan, Rahul Shah, Wing-Kai Hon, Jeffrey Scott Vitter, Sabrina Chandrasekaran

Comments (0)

Sciweavers

Inverted indexes for phrases and strings

Information Technology | Inverted Index | Optimal Techniques | SIGIR 2011 | Word Boundaries |

Explore & Download

Productivity Tools

Sciweavers