Sciweavers

48 search results - page 4 / 10
» Scalable failure recovery for high-performance data aggregat...
Sort
View
ASPLOS
2004
ACM
14 years 4 months ago
Scalable selective re-execution for EDGE architectures
Pipeline flushes are becoming increasingly expensive in modern microprocessors with large instruction windows and deep pipelines. Selective re-execution is a technique that can r...
Rajagopalan Desikan, Simha Sethumadhavan, Doug Bur...
CLUSTER
2004
IEEE
14 years 2 months ago
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
Gengbin Zheng, Lixia Shi, Laxmikant V. Kalé
AMW
2010
14 years 10 days ago
Multiresolution Cube Estimators for Sensor Network Aggregate Queries
In this work we present in-network techniques to improve the efficiency of spatial aggregate queries. Such queries are very common in a sensornet setting, demanding more targeted t...
Alexandra Meliou, Carlos Guestrin, Joseph M. Helle...
CCGRID
2006
IEEE
14 years 5 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
CISS
2008
IEEE
14 years 5 months ago
Overlay protection against link failures using network coding
Abstract—This paper introduces a network coding-based protection scheme against single and multiple link failures. The proposed strategy makes sure that in a connection, each nod...
Ahmed E. Kamal, Aditya Ramamoorthy