Stateful bulk processing for incremental analytics

15 years 11 months ago

Download cseweb.ucsd.edu

This work addresses the need for stateful dataﬂow programs that can rapidly sift through huge, evolving data sets. These data-intensive applications perform complex multi-step computations over successive generations of data inﬂows, such as weekly web crawls, daily image/video uploads, log ﬁles, and growing social networks. While programmers may simply re-run the entire dataﬂow when new data arrives, this is grossly inefﬁcient, increasing result latency and squandering hardware resources and energy. Alternatively, programmers may use prior results to incrementally incorporate the changes. However, current large-scale data processing tools, such as Map-Reduce or Dryad, limit how programmers incorporate and use state in data-parallel programs. Straightforward approaches to incorporating state can result in custom, fragile code and disappointing performance. This work presents a generalized architecture for continuous bulk processing (CBP) that raises the level of abstraction f...

Dionysios Logothetis, Christopher Olston, Benjamin

Real-time Traffic

CLOUD 2010 | Data Movement | Distributed And Parallel Computing | Large-scale Data Processing | Stateful Dataﬂow Programs |

claim paper

» Parallel community detection on large networks with propinquity dynamics

Post Info
More Details (n/a)

Added	10 Jul 2010
Updated	10 Jul 2010
Type	Conference
Year	2010
Where	CLOUD
Authors	Dionysios Logothetis, Christopher Olston, Benjamin Reed, Kevin C. Webb, Ken Yocum

Comments (0)

Sciweavers

Stateful bulk processing for incremental analytics

CLOUD 2010 | Data Movement | Distributed And Parallel Computing | Large-scale Data Processing | Stateful Dataﬂow Programs |

Explore & Download

Productivity Tools

Sciweavers