Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. We propose an alternative approach in this paper to develop a reversible transformation that can be applied to a source text that improves existing algorithm's ability to compress. The basic idea behind our approach is to encode every word in the input text file, which is also found in the English text dictionary that we are using, as a word in our transformed static dictionary. These transformed words give shorter length for most of the input words and also retain some context and redundancy. Thus we achieve some compression at the preprocessing stage as well as retain enough context and redundancy for the compression algorithms to give better results. Bzip2 with our proposed text transform, LIPT, gives 5.24...
Fauzia S. Awan, Nan Zhang 0005, Nitin Motgi, Raja