Sciweavers

CLOUD
2010
ACM

Stateful bulk processing for incremental analytics

14 years 4 months ago
Stateful bulk processing for incremental analytics
This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evolving data sets. These data-intensive applications perform complex multi-step computations over successive generations of data inflows, such as weekly web crawls, daily image/video uploads, log files, and growing social networks. While programmers may simply re-run the entire dataflow when new data arrives, this is grossly inefficient, increasing result latency and squandering hardware resources and energy. Alternatively, programmers may use prior results to incrementally incorporate the changes. However, current large-scale data processing tools, such as Map-Reduce or Dryad, limit how programmers incorporate and use state in data-parallel programs. Straightforward approaches to incorporating state can result in custom, fragile code and disappointing performance. This work presents a generalized architecture for continuous bulk processing (CBP) that raises the level of abstraction f...
Dionysios Logothetis, Christopher Olston, Benjamin
Added 10 Jul 2010
Updated 10 Jul 2010
Type Conference
Year 2010
Where CLOUD
Authors Dionysios Logothetis, Christopher Olston, Benjamin Reed, Kevin C. Webb, Ken Yocum
Comments (0)