This paper improves the Tagged Suboptimal Codes (TSC) compression scheme in several ways. We show how to process the TSC as a universal code. We introduce the TSCk as a family of universal codes where TSC0 is the original TSC. Instead of constructing an optimal-code such as Huffman, we choose the best near-optimalcode from the TSCk family of universal prefix-codes. We introduce a fast decoding technique that uses compact transition tables in order to decode the compressed data as bytes. We adopt the Aho-Corasick pattern matching algorithm to use the same compact tables, which are used in the decoding process,in order to perform a fast pattern matching in the TSCk compressed domain. These improvements make the TSCk compression scheme fast and compact. The encoding, decoding and search time of the TSCk compression scheme are similar. These makes the TSCk an ideal compression scheme for processing of text, which takes place in a steaming mode, in a machine/device that has a limited memor...
S. Harrusi, Amir Averbuch, N. Rabin