Continuous sampling for online aggregation over multiple queries

15 years 11 months ago

Download www.comp.nus.edu.sg

In this paper, we propose an online aggregation system called COSMOS (Continuous Sampling for Multiple queries in an Online aggregation System), to process multiple aggregate queries eﬃciently. In COSMOS, a dataset is ﬁrst scrambled so that sequentially scanning the dataset gives rise to a stream of random samples for all queries. Moreover, COSMOS organizes queries into a dissemination graph to exploit the dependencies across queries. In this way, aggregates of queries closer to the root (source of data ﬂow) can potentially be used to compute the aggregates of descendent/dependent queries. COSMOS applies some statistical approach to combine answers from ancestor nodes to generate the online aggregates for a node. COSMOS also oﬀers a partitioning strategy to further salvage intermediate answers. We have implemented COSMOS and conducted an extensive experimental study in PostgreSQL. Our results on the TPC-H benchmark show the eﬃciency and eﬀectiveness of COSMOS. Categories a...

Sai Wu, Beng Chin Ooi, Kian-Lee Tan

Real-time Traffic