XML is widely praised for its flexibility in allowing repeated and missing sub-elements. However, this flexibility makes it challenging to develop a bulk algebra, which typically manipulates sets of objects with identical structure. A set of XML elements, say of type book, may have members that vary greatly in structure, e.g. in the number of author sub-elements. This kind of heterogeneity may permeate the entire document in a recursive fashion: e.g., different authors of the same or different book may in turn greatly vary in structure. Even when the document conforms to a schema, the flexible nature of schemas for XML still allows such significant variations in structure among elements in a collection. Bulk processing of such heterogeneous sets is problematic. In this paper, we introduce the notion of logical classes (LC) of pattern tree nodes, and generalize the notion of pattern tree matching to handle node logical classes. This abstraction pays off significantly in allowing us to ...
Stelios Paparizos, Yuqing Wu, Laks V. S. Lakshmana