Compression of Unicode Files

14 years 6 months ago

Download www.cs.auckland.ac.nz

The increasing importance of Unicode for text files, for example with Java and in some modern operating systems, implies a possible doubling of data storage space and data transmission time, with a corresponding need for data compression. However it is not clear that data compressors designed for 8-bit byte data are well matched to 16-bit Unicode data. This paper investigates the compression of Unicode files, using a variety of established data compressors on a mix of genuine and artificial Unicode files. It is found that while Ziv-Lempel and unbounded context compressors work well, finite-context compressors are less satisfactory on Unicode. Tests with a simple special compressor intended for 16-bit data show that it may be useful to design compressors specifically for Unicode files.

Peter M. Fenwick, Simon Brierley

Real-time Traffic

Compressors | Computer Graphics | DCC 1998 | Unicode | Unicode Files |

claim paper

Post Info
More Details (n/a)

Added	04 Aug 2010
Updated	04 Aug 2010
Type	Conference
Year	1998
Where	DCC
Authors	Peter M. Fenwick, Simon Brierley

Comments (0)

Sciweavers

Compression of Unicode Files

Compressors | Computer Graphics | DCC 1998 | Unicode | Unicode Files |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers