Sciweavers

SODA
2001
ACM

A linear lower bound on index size for text retrieval

14 years 1 months ago
A linear lower bound on index size for text retrieval
Most information-retrieval systems preprocess the data to produce an auxiliary index structure. Empirically, it has been observed that there is a tradeoff between query response time and the size of the index. When indexing a large corpus, such as the web, the size of the index is an important consideration. In this case it would be ideal to produce an index that is substantially smaller than the text. In this work we prove a linear worst-case lower bound on the size of any index that reports the location (if any) of a substring in the text in time proportional to the length of the pattern. In other words, an index supporting linear-time substring searches requires about as much space as the original text. Here "time" is measured in the number of bit probes to the text; an arbitrary amount of computation may be done on an arbitrary amount of the index. Our lower bound applies to inverted word indices as well.
Erik D. Demaine, Alejandro López-Ortiz
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where SODA
Authors Erik D. Demaine, Alejandro López-Ortiz
Comments (0)