We consider approximate join processing over data streams when memory limitations cause incoming tuples to overflow the available space, precluding exact processing. Selective eviction of tuples (loadshedding) is needed, but is challenging since data distributions and arrival rates are unknown a priori. Also, in many real-world applications such as for the stock market and sensor-data, different items may have different importance levels. Current methods pay little attention to load-shedding when tuples bear such importance semantics, and perform poorly due to premature tuple drops and unproductive tuple retention. We propose a novel framework, called iJoin, which overcomes these drawbacks, and also provides tuples a fair chance in being part of the join result. Our load-shedding scheme for iJoin maximizes the total importance of join results, and allows reconfiguration of tuple-importance. We also show how to trade off load-shedding overhead and approximation-error. Our experimen...
Dhananjay Kulkarni, Chinya V. Ravishankar