Indexing XML is crucial for efficient XML query processing. We propose a compact tree (Ctree) for XML indexing, which provides not only concise path summaries at group level but also detailed child-parent relationships at element level. Based on Ctree, we are able to measure how well XML data is structured. We also propose a three-step query processing method. Its efficiency is achieved by: (1) summarizing large XML data structures into a condensed Ctree; (2) pruning irrelevant groups to significantly reduce the search space; (3) eliminating join operations between the matches for value predicates and those for structure constraints and (4) using Ctree properties such as regular groups to reduce query processing time. Our experiments reveal that Ctree is an effective data structure for managing XML data. Categories and Subject Descriptors E.1 [Data Structures]: trees General Terms Algorithms, Measurement, Design, Performance Keywords Path summary, compact tree, XML index, XQuery proce...
Qinghua Zou, Shaorong Liu, Wesley W. Chu