XML simplifies data exchange among heterogeneous computers, but it is notoriously verbose and has spawned the development of many XML-specific compressors and binary formats. We present an XML test corpus and a combined efficiency metric integrating compression ratio and execution speed. We use this corpus and linear regression to assess 14 general-purpose and XML-specific compressors relative to the proposed metric. We also identify key factors when selecting a compressor. Our results show XMill or WBXML may be useful in some instances, but a general-purpose compressor is often the best choice. Categories and Subject Descriptors E.4 [Data]: Coding and Information Theory--Data Compaction and Compression; H.3.4 [Systems and Software]: performance evaluation (efficiency and effectiveness) General Terms Algorithms, Measurement, Performance, Experimentation Keywords XML, corpus, compression, binary format, linear regression
Christopher J. Augeri, Dursun A. Bulutoglu, Barry