Fast Searching on Compressed Text Allowing Errors

15 years 10 months ago

Download www.dcc.uchile.cl

Abstract We present a fast compression and decompression scheme for natural language texts that allows e cient and exible string matching by searching the compressed text directly. The compression scheme uses a word-based Hu man encoding and the coding alphabet is byte-oriented rather than bit-oriented. We compress typical English texts to about 30% of their original size, against 40% and 35% for Compress and Gzip, respectively. Compression times are close to the times of Compress and approximately half the times of Gzip, and decompression times are lower than those of Gzip and one third of those of Compress. The searching algorithm allows a large number of variations of the exact and approximate compressed string matching problem, such as phrases, ranges, complements, wild cards and arbitrary regular expressions. Separators and stopwords can be discarded at search time without signi cantly increasing the cost. The algorithm is based on a word-oriented shift-or algorithm and a fast Boy...

Edleno Silva de Moura, Gonzalo Navarro, Nivio Zivi

Real-time Traffic

Algorithm | Compressed Text | Information Management | SIGIR 1998 | String Matching |

claim paper

» Fast InMemory XPath Search using Compressed Indexes

» Direct Pattern Matching on Compressed Text

» Fast Approximate String Matching in a Dictionary

» A New Searchable VariabletoVariable Compressor

» An Index for Two Dimensional String Matching Allowing Rotations

» Adding Compression to Block Addressing Inverted Indexes

» Speeding Up Pattern Matching by Text Compression

» A General Practical Approach to Pattern Matching over ZivLempel Compressed Text

Post Info
More Details (n/a)

Added	05 Aug 2010
Updated	05 Aug 2010
Type	Conference
Year	1998
Where	SIGIR
Authors	Edleno Silva de Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo A. Baeza-Yates

Comments (0)

Sciweavers

Fast Searching on Compressed Text Allowing Errors

Algorithm | Compressed Text | Information Management | SIGIR 1998 | String Matching |

Explore & Download

Productivity Tools

Sciweavers