Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

85

SODA
2001
ACM

favoriteEmaildiscussreport

87views Algorithms» more SODA 2001»

A linear lower bound on index size for text retrieval

15 years 3 months ago

A linear lower bound on index size for text retrieval

Download erikdemaine.org

Most information-retrieval systems preprocess the data to produce an auxiliary index structure. Empirically, it has been observed that there is a tradeoff between query response time and the size of the index. When indexing a large corpus, such as the web, the size of the index is an important consideration. In this case it would be ideal to produce an index that is substantially smaller than the text. In this work we prove a linear worst-case lower bound on the size of any index that reports the location (if any) of a substring in the text in time proportional to the length of the pattern. In other words, an index supporting linear-time substring searches requires about as much space as the original text. Here "time" is measured in the number of bit probes to the text; an arbitrary amount of computation may be done on an arbitrary amount of the index. Our lower bound applies to inverted word indices as well.

Erik D. Demaine, Alejandro López-Ortiz

Real-time Traffic

Algorithms | Auxiliary Index Structure | Lower Bound | Most Information-retrieval Systems | SODA 2001 |

claim paper

Related Content

» Querydriven indexing for scalable peertopeer text retrieval

» Scalable Text Retrieval for Large Digital Libraries

» Multiquery ComputationallyPrivate Information Retrieval with Constant Communication Rate

» An Index for Two Dimensional String Matching Allowing Rotations

» GraphBased Multilevel Dimensionality Reduction with Applications to Eigenfaces and Latent ...

» Hybrid index maintenance for growing text collections

» TimeIndexed Formulations for Machine Scheduling Problems Column Generation

» A Gradient Difference Based Technique for Video Text Detection

» Indexable PLA for Efficient Similarity Search

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	SODA
Authors	Erik D. Demaine, Alejandro López-Ortiz

Comments (0)