Compact full-text indexing of versioned document collections

14 years 7 months ago

Download cis.poly.edu

We study the problem of creating highly compressed fulltext index structures for versioned document collections, that is, collections that contain multiple versions of each document. Important examples of such collections are Wikipedia or the web page archive maintained by the Internet Archive. A straightforward indexing approach would simply treat each document version as a separate document, such that index size scales linearly with the number of versions. However, several authors have recently studied approaches that exploit the signiﬁcant similarities between diﬀerent versions of the same document to obtain much smaller index sizes. In this paper, we propose new techniques for organizing and compressing inverted index structures for such collections. We also perform a detailed experimental comparison of new techniques and the existing techniques in the literature. Our results on an archive of the English version of Wikipedia, and on a subset of the Internet Archive collection,...

Jinru He, Hao Yan, Torsten Suel

Real-time Traffic

CIKM 2009 | Database | Index Structures | Inverted Index | Versioned Documents |

claim paper

Post Info
More Details (n/a)

Added	26 May 2010
Updated	26 May 2010
Type	Conference
Year	2009
Where	CIKM
Authors	Jinru He, Hao Yan, Torsten Suel

Comments (0)

Sciweavers

Compact full-text indexing of versioned document collections

CIKM 2009 | Database | Index Structures | Inverted Index | Versioned Documents |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers