In this paper, we present an adaptive load diffusion operator to enable scalable processing of Multiway Windowed Stream Joins (MWSJs) using a cluster system. The load diffusion is achieved by a set of novel semantics-preserving tuple routing algorithms. Different from previous work, the load diffusion operator can (1) preserve the MWSJ semantics while spreading tuples to different hosts for parallel join processing; (2) achieve fine-grained load balancing among distributed hosts; and (3) perform semantics-preserving online adaptations to maintain optimal performance in dynamic stream environments. We have implemented a prototype of the distributed MWSJ framework on top of the System S distributed stream processing system. Our experiment results based on both real data streams and synthetic workloads show that the load diffusion algorithms can efficiently scale-up the performance of MWSJ processing with low overhead.
Xiaohui Gu, Philip S. Yu, Haixun Wang