Sciweavers

357 search results - page 6 / 72
» Fault-Tolerant Aggregation for Dynamic Networks
Sort
View
ICPP
2000
IEEE
13 years 11 months ago
A Problem-Specific Fault-Tolerance Mechanism for Asynchronous, Distributed Systems
The idle computers on a local area, campus area, or even wide area network represent a significant computational resource--one that is, however, also unreliable, heterogeneous, an...
Adriana Iamnitchi, Ian T. Foster
FGCS
2002
153views more  FGCS 2002»
13 years 7 months ago
HARNESS fault tolerant MPI design, usage and performance issues
Initial versions of MPI were designed to work efficiently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to suppor...
Graham E. Fagg, Jack Dongarra
ICMAS
2000
13 years 9 months ago
The Adaptive Agent Architecture: Achieving Fault-Tolerance Using Persistent Broker Teams
Brokers are used in many multi-agent systems for locating agents, for routing and sharing information, for managing the system, and for legal purposes, as independent third partie...
Sanjeev Kumar, Philip R. Cohen, Hector J. Levesque
HIPC
2009
Springer
13 years 5 months ago
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture
Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Ga...
NSDI
2010
13 years 9 months ago
MapReduce Online
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire outp...
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M....