The ubiquitous adoption of XML as the standard of data exchange over the web has led to increased interest in building efficient and scalable XML publish-subscribe (pub-sub) systems. The central function of an XML-based pub-sub system is to perform XML filtering efficiently, i.e. identify those XPath expressions that have a match in a streaming XML document. In this paper, we propose a new sequence-based approach, which transforms both XML documents and XPath twig expressions into Node Encoded Tree Sequences (NETS). In terms of this encoding, we provide a necessary and sufficient condition for an XPath twig to represent a match in a given XML document. The proposed filtering procedure is based on a new subsequence matching algorithm devised for NETS, which identifies the set of matched queries free of false positives with a single scan of the XML document. Extensive experimental results show that the NETS method outperforms previous XML filtering approaches.
Mariam Salloum, Vassilis J. Tsotras