Entry Pairing in Inverted File

16 years 3 months ago

Download www.di.unipi.it

Abstract. This paper proposes to exploit content and usage information to rearrange an inverted index for a full-text IR system. The idea is to merge the entries of two frequently co-occurring terms, either in the collection or in the answered queries, to form a single, paired, entry. Since postings common to paired terms are not replicated, the resulting index is more compact. In addition, queries containing terms that have been paired are answered faster since we can exploit the pre-computed posting intersection. In order to choose which terms have to be paired, we formulate the term pairing problem as a Maximum-Weight Matching Graph problem, and we evaluate in our scenario eﬃciency and eﬃcacy of both an exact and a heuristic solution. We apply our technique: (i) to compact a compressed inverted ﬁle built on an actual Web collection of documents, and (ii) to increase capacity of an in-memory posting list. Experiments showed that in the ﬁrst case our approach can improve the c...

Hoang Thanh Lam, Raffaele Perego, Nguyen Thoi Minh

Real-time Traffic

Computer Science | In-memory Posting List | Pre-computed Posting Intersection | Queries Containing Terms | WISE 2009 |

claim paper

» HAT a hardware assisted TOPDOC inverted index component

» On Using Query Logs for Static Index Pruning

Post Info
More Details (n/a)

Added	08 Mar 2010
Updated	08 Mar 2010
Type	Conference
Year	2009
Where	WISE
Authors	Hoang Thanh Lam, Raffaele Perego, Nguyen Thoi Minh Quan, Fabrizio Silvestri

Comments (0)

Sciweavers

Entry Pairing in Inverted File

Computer Science | In-memory Posting List | Pre-computed Posting Intersection | Queries Containing Terms | WISE 2009 |

Explore & Download

Productivity Tools

Sciweavers