The increasing importance of Unicode for text files, for example with Java and in some modern operating systems, implies a possible doubling of data storage space and data transmi...
In this paper we present a new dictionary-based preprocessing technique and its implementation called TWRT (Two-level Word Replacing Transformation). Our preprocessor uses several...
We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth netw...
Berthier A. Ribeiro-Neto, Joao Paulo Kitajima, Gon...
Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, Dynamic Markov Compression (D...
Fauzia S. Awan, Nan Zhang 0005, Nitin Motgi, Raja ...
Word-based Huffman coding has widespread use in information retrieval systems. Besides its compressing power, it also enables the implementation of both indexing and searching sch...