Sciweavers

IPPS
2007
IEEE
14 years 1 months ago
A Fault Tolerance Protocol with Fast Fault Recovery
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Sayantan Chakravorty, Laxmikant V. Kalé
IPPS
2007
IEEE
14 years 1 months ago
The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
IPPS
2007
IEEE
14 years 1 months ago
Implementing and Evaluating Automatic Checkpointing
As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide...
Antonio S. Martins, Ronaldo Augusto Lara Gon&ccedi...
ICCCN
2007
IEEE
14 years 1 months ago
Design Techniques for Streamlined Integration and Fault Tolerance in a Distributed Sensor System for Line-crossing Recognition
Abstract — Distributed sensor system applications (e.g., wireless sensor networks) have been studied extensively in recent years. Such applications involve resource-limited embed...
Chung-Ching Shen, Roni Kupershtok, Shuvra S. Bhatt...
HPDC
2007
IEEE
14 years 1 months ago
Failure-aware checkpointing in fine-grained cycle sharing systems
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational resources available on the Internet. Such systems allow guest jobs to run on a ho...
Xiaojuan Ren, Rudolf Eigenmann, Saurabh Bagchi
HASE
2007
IEEE
14 years 1 months ago
Advances in Quantum Computing Fault Tolerance and Testing
We study recent developments in quantum computing (QC) testing and fault tolerance (FT) techniques and discuss several attempts to formalize quantum logic fault models. We illustr...
David Y. Feinstein, V. S. S. Nair, Mitchell A. Tho...
ESCIENCE
2007
IEEE
14 years 1 months ago
Intelligent Selection of Fault Tolerance Techniques on the Grid
The emergence of computational grids has lead to an increased reliance on task schedulers that can guarantee the completion of tasks that are executed on unreliable systems. There...
Daniel C. Vanderster, Nikitas J. Dimopoulos, Randa...
DSN
2007
IEEE
14 years 1 months ago
Fault Tolerant Approaches to Nanoelectronic Programmable Logic Arrays
Programmable logic arrays (PLA), which can implement arbitrary logic functions in a two-level logic form, are promising as platforms for nanoelectronic logic due to their highly r...
Wenjing Rao, Alex Orailoglu, Ramesh Karri
CCGRID
2007
IEEE
14 years 1 months ago
Executing Large Parameter Sweep Applications on a Multi-VO Testbed
Applications that span multiple virtual organizations (VOs) are of great interest to the eScience community. However, recent attempts to execute large-scale parameter sweep applic...
Shahaan Ayyub, David Abramson, Colin Enticott, Sla...
PRDC
2008
IEEE
14 years 2 months ago
VTV - A Voting Strategy for Real-Time Systems
Real-time applications typically have to satisfy high dependability requirements and require fault tolerance in both value and time domains. A widely used approach to ensure fault...
Hüseyin Aysan, Sasikumar Punnekkat, Radu Dobr...