Fault-tolerant stream processing using a distributed, replicated file system

13 years 11 months ago

Download www.cs.washington.edu

We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal stream processing and leaves more resources available for normal stream processing than previous proposals. Like several previous schemes, SGuard is based on rollback recovery [18]: it checkpoints the state of stream processing nodes periodically and restarts failed nodes from their most recent checkpoints. In contrast to previous proposals, however, SGuard performs checkpoints asynchronously: i.e., operators continue processing streams during the checkpoint thus reducing the potential disruption due to the checkpointing activity. Additionally, SGuard saves the checkpointed state into a new type of distributed and replicated file system (DFS) such as GFS [22] or HDFS [9], leaving more memory resources available for normal stream processing. To manage resource contention due to simultaneous checkpoints by diffe...

YongChul Kwon, Magdalena Balazinska, Albert G. Gre

Real-time Traffic

Normal Stream | Normal Stream Processing | PVLDB 2008 | Resource |

claim paper

Post Info
More Details (n/a)

Added	28 Dec 2010
Updated	28 Dec 2010
Type	Journal
Year	2008
Where	PVLDB
Authors	YongChul Kwon, Magdalena Balazinska, Albert G. Greenberg

Comments (0)

Sciweavers

Fault-tolerant stream processing using a distributed, replicated file system

Normal Stream | Normal Stream Processing | PVLDB 2008 | Resource |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers