We present our development of an XML compression and querying tool, which is called XML Compression and Querying System (XCQ). This system is developed based on a novel technique called DTD Tree and SAX Event Stream Parsing (DSP). This technique is designed for efficient compression of XML documents that conform to a given DTD without involving user expertise. A reasonable compression ratio, which is comparable to that of XMill, is achieved by DSP. The compressed documents in XCQ adopt a partitioned path-based data grouping which supports evaluating queries without running a full decompression. We demonstrate with examples how to query compressed documents in XCQ. Keywords XML, compression, querying, SAX parsing
Wai Yeung Lam, Wilfred Ng, Peter T. Wood, Mark Lev