A large class of applications require real-time processing of continuous stream data resulting in the development of data stream management systems (DSMS). Since many of these applications are distributed, distributed DSMSs are starting to receive attention. In this paper, we focus on an important issue in distributed DSMS operation, namely load distribution to minimize end-to-end latency. We identify the often conflicting requirements of load distribution, and propose a "potential-driven" load distribution approach to mimic the movements of objects in the physical world. Our approach also takes into account heterogeneous machines, different network conditions, and resource constraints. We present experimental results that investigate our algorithms from various aspects, and show that they outperform existing techniques in terms of end-to-end latency. Categories and Subject Descriptors H.2.4 [Database Management]: Systems--Distributed databases General Terms Algorithms, Desi...
Weihan Wang, Mohamed A. Sharaf, Shimin Guo, M. Tam