Sciweavers

126 search results - page 11 / 26
» The cost of fault tolerance in multi-party communication com...
Sort
View
CCGRID
2006
IEEE
14 years 1 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
ICDCS
2012
IEEE
11 years 10 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
NOCS
2007
IEEE
14 years 1 months ago
Towards Open Network-on-Chip Benchmarks
Measuring and comparing performance, cost, and other features of advanced communication architectures for complex multi core/multiprocessor systems on chip is a significant challe...
Cristian Grecu, André Ivanov, Partha Pratim...
VTC
2007
IEEE
109views Communications» more  VTC 2007»
14 years 1 months ago
A Reliability-Aware LDPC Code Decoding Algorithm
— With the continuing downscaling of microelectronic technology, chip reliability becomes a great threat to the design of future complex microelectronic systems. Hence increasing...
Matthias Alles, Torben Brack, Norbert Wehn
HASE
1999
IEEE
13 years 12 months ago
Building Dependable Distributed Applications Using AQUA
Building dependable distributed systems using ad hoc methods is a challenging task. Without proper support, an application programmer must face the daunting requirement of having ...
Jennifer Ren, Michel Cukier, Paul Rubel, William H...