Partitioned query processing is an effective method to process continuous queries with large stateful operators in a distributed systems. This method typically partitions input data into nonoverlapping portions, with each query plan instance installed on a separate machine processing only one portion of the data. Dynamic redistribution of load among machines is then employed to handle fluctuating stream characteristics. However, existing load redistribution solutions have made the implicit assumption that no local query optimization is conducted at runtime on any of the participating machines, i.e., all local query plan instances are static and thus remain identical. This is restrictive for dynamic stream systems, where data partitions may experience significant fluctuations in selectivities or arrival rates over time - thus warranting local plan reoptimization. This raises the new problem that the heterogeneity of plan shapes among different machines must be tackled when doing loa...
Yali Zhu, Elke A. Rundensteiner