Distributed stream query services must simultaneously process a large number of complex, continuous queries with stringent performance requirements while utilizing distributed processing resources. In this paper we present the design and evaluation of a distributed stream query service that achieves massive scalability, a key design principle for such systems, by taking advantage of the opportunity to reuse the same distributed operator for multiple and different concurrent queries. We present concrete techniques that utilize the well-defined semantics of CQL-style queries to reduce the cost of query deployment and duplicate processing thereby increasing system throughput and scalability. Our system exhibits several unique features, including : (1) a `reuse lattice' to encode both operator similarity and network locality using a uniform data structure; (2) techniques to generate an optimized query grouping plan in the form of `relaxed operators' to capitalize on reuse opport...
Sangeetha Seshadri, Bhuvan Bamba, Brian F. Cooper,