Comet: batched stream processing for data intensive distributed computing

14 years 8 months ago

Download www.se.cuhk.edu.hk

Batched stream processing is a new distributed data processing paradigm that models recurring batch computations on incrementally bulk-appended data streams. The model is inspired by our empirical study on a trace from a large-scale production data-processing cluster; it allows a set of eﬀective query optimizations that are not possible in a traditional batch processing model. We have developed a query processing system called Comet that embraces batched stream processing and integrates with DryadLINQ. We used two complementary methods to evaluate the eﬀectiveness of optimizations that Comet enables. First, a prototype system deployed on a 40-node cluster shows an I/O reduction of over 40% using our benchmark. Second, when applied to a real production trace covering over 19 million machine-hours, our simulator shows an estimated I/O saving of over 50%. Categories and Subject Descriptors C.2.4 [Computer-communication networks]: Distributed systems—Distributed databases; H.2.4 [Da...

Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, B

Real-time Traffic

CLOUD 2010 | Distributed And Parallel Computing | Distributed Data Processing | Query Processing | Stream Processing |

claim paper

Post Info
More Details (n/a)

Added	10 Jul 2010
Updated	10 Jul 2010
Type	Conference
Year	2010
Where	CLOUD
Authors	Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, Bing Su, Wei Lin, Lidong Zhou

Comments (0)

Sciweavers

Comet: batched stream processing for data intensive distributed computing

CLOUD 2010 | Distributed And Parallel Computing | Distributed Data Processing | Query Processing | Stream Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers