Many systems such as Tukwila and YFilter combine automaton and algebra techniques to process queries over tokenized XML streams. Typically in this architecture, an automaton is first used to locate all query patterns in the input stream and compose the matched tokens into XML element nodes. These XML nodes are then passed to the tuple-based algebraic operators for further filtering or restructuring. This common processing style is however not always optimal. At times it is more efficient to retrieve only a subset of the patterns in the automaton while retrieving the rest of the patterns on the XML element nodes. In this paper, we use a cost-based solution to explore this novel optimization opportunity. We design three plan optimization algorithms, namely, MinExhaust, GreedyBasic and FastPrune. We also study how to migrate from a currently running plan to a new plan in a safe and efficient manner. Our experimentations have shown that the GreedyBasic or FastPrune algorithm can quickly f...
Hong Su, Elke A. Rundensteiner, Murali Mani