Sciweavers

CLOUD
2010
ACM

Comet: batched stream processing for data intensive distributed computing

14 years 5 months ago
Comet: batched stream processing for data intensive distributed computing
Batched stream processing is a new distributed data processing paradigm that models recurring batch computations on incrementally bulk-appended data streams. The model is inspired by our empirical study on a trace from a large-scale production data-processing cluster; it allows a set of effective query optimizations that are not possible in a traditional batch processing model. We have developed a query processing system called Comet that embraces batched stream processing and integrates with DryadLINQ. We used two complementary methods to evaluate the effectiveness of optimizations that Comet enables. First, a prototype system deployed on a 40-node cluster shows an I/O reduction of over 40% using our benchmark. Second, when applied to a real production trace covering over 19 million machine-hours, our simulator shows an estimated I/O saving of over 50%. Categories and Subject Descriptors C.2.4 [Computer-communication networks]: Distributed systems—Distributed databases; H.2.4 [Da...
Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, B
Added 10 Jul 2010
Updated 10 Jul 2010
Type Conference
Year 2010
Where CLOUD
Authors Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, Bing Su, Wei Lin, Lidong Zhou
Comments (0)