Sciweavers

AINA
2009
IEEE

Document-Oriented Pruning of the Inverted Index in Information Retrieval Systems

13 years 9 months ago
Document-Oriented Pruning of the Inverted Index in Information Retrieval Systems
Searching very large collections can be costly in both computation and storage. To reduce this cost, recent research has focused on reducing the size (pruning) of the inverted index. The inverted index represents a table, the rows and columns of which are terms in the lexicon and documents in the collection, respectively. A non-zero entry in the table, known as a posting, indicates that the corresponding document contains the term. Previous researches on static index pruning was either (i) posting-oriented, in which less important postings are removed from the table, or (ii) termoriented, in which less important terms are removed from the table. In this paper, we investigate a new, documentoriented pruning strategy that removes entire columns of the table, i.e. removes less important documents from the collection. Three methods for estimating the importance of a document are proposed. Methods 1 and 2 are dependent on the score function of the retrieval system (e.g. Okapi BM25), while ...
Lei Zheng, Ingemar J. Cox
Added 16 Feb 2011
Updated 16 Feb 2011
Type Journal
Year 2009
Where AINA
Authors Lei Zheng, Ingemar J. Cox
Comments (0)