Because of the high volume and unpredictable arrival rate, stream processing systems may not always be able to keep up with the input data streams-- resulting in buffer overflow and uncontrolled loss of data. Load shedding, the prevalent strategy for solving this overflow problem, has so far only been considered for relational stream processing, but not for XML. Shedding applied to XML stream processing brings new opportunities and challenges due to complex nested nature of XML structures. In this paper, we tackle this unsolved XML shedding problem using a three-pronged approach. First, we develop an XQuery preference model that enables users to specify the relative importance of preserving different subpatterns in the XML result structure. This transforms shedding into the problem of rewriting the user query into shed queries that return approximate query answers with utility as measured by the given user preference model. Second, we develop a cost model to compare the performance of...
Mingzhu Wei, Elke A. Rundensteiner, Murali Mani