Factorization-based lossless compression of inverted indices

13 years 7 months ago

Download www.cs.uwaterloo.ca

Many large-scale Web applications that require ranked top-k retrieval are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non-zero elements indicate the strength of term-document associations. In this work, we present an approach for lossless compression of inverted indices. Our approach maps terms in a document corpus to a new term space in order to reduce the number of non-zero elements in the term-document matrix, resulting in a more compact inverted index. We formulate the problem of selecting a new term space as a matrix factorization problem, and prove that ﬁnding the optimal solution is an NP-hard problem. We develop a greedy algorithm for ﬁnding an approximate solution. A side effect of our approach is increasing the number of terms in the index, which may negatively affect query evaluation performance. To eliminate such effect, we develop a methodology for modifying query evaluation algorithms by exploiting speciﬁc p...

George Beskales, Marcus Fontoura, Maxim Gurevich,

Real-time Traffic

CIKM 2011 | Document Matrix | Evaluation Algorithms | Information Storage And Retrieval | Information Technology |

claim paper

Post Info
More Details (n/a)

Added	13 Dec 2011
Updated	13 Dec 2011
Type	Journal
Year	2011
Where	CIKM
Authors	George Beskales, Marcus Fontoura, Maxim Gurevich, Sergei Vassilvitskii, Vanja Josifovski

Comments (0)

Sciweavers

Factorization-based lossless compression of inverted indices

CIKM 2011 | Document Matrix | Evaluation Algorithms | Information Storage And Retrieval | Information Technology |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers