Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for e cient, low-overhead text retrieval. Modern text compression techniques can reduce the text to less than 30% of its size and allow searching it directly and faster than the uncompressed text. Inverted index compression obtains signi cant reduction of their original size at the same processing speed. Block addressing makes the inverted lists point to text blocks instead of exact positions and pay the reduction in space with some sequential text scanning. In this work we combine the three ideas in a single scheme. We present a compressed inverted le that indexes compressed text and uses block addressing. We consider di erent techniques to compress the index and study their performance with respect to the block size. We compare the index against three separate techniques for varying block sizes, showing that our index is superior to each isola...
Gonzalo Navarro, Edleno Silva de Moura, Marden S.