Dual-Sorted Inverted Lists

15 years 5 months ago

Download www.dcc.uchile.cl

Several IR tasks rely, to achieve high eﬃciency, on a single pervasive data structure called the inverted index. This is a mapping from the terms in a text collection to the documents where they appear, plus some supplementary data. Diﬀerent orderings in the list of documents associated to a term, and diﬀerent supplementary data, ﬁt widely diﬀerent IR tasks. Index designers have to choose the right order for one such task, rendering the index diﬃcult to use for others. In this paper we introduce a general technique, based on wavelet trees, to maintain a single data structure that oﬀers the combined functionality of two independent orderings for an inverted index, with competitive eﬃciency and within the space of one compressed inverted index. We show in particular that the technique allows combining an ordering by decreasing term frequency (useful for ranked document retrieval) with an ordering by increasing document identiﬁer (useful for phrase and Boolean queries). ...

Gonzalo Navarro, Simon J. Puglisi

Real-time Traffic