Sciweavers

65 search results - page 9 / 13
» Reliability Support for On-Chip Memories Using Networks-on-C...
Sort
View
ICDCS
1998
IEEE
13 years 11 months ago
Low-Overhead Protocols for Fault-Tolerant File Sharing
In this paper, we quantify the adverse effect of file sharing on the performance of reliable distributed applications. We demonstrate that file sharing incurs significant overhead...
Lorenzo Alvisi, Sriram Rao, Harrick M. Vin
PPOPP
2006
ACM
14 years 1 months ago
Fast and transparent recovery for continuous availability of cluster-based servers
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, o...
Rosalia Christodoulopoulou, Kaloian Manassiev, Ang...
KBSE
2005
IEEE
14 years 1 months ago
Locating faulty code using failure-inducing chops
Software debugging is the process of locating and correcting faulty code. Prior techniques to locate faulty code either use program analysis techniques such as backward dynamic pr...
Neelam Gupta, Haifeng He, Xiangyu Zhang, Rajiv Gup...
ICS
2004
Tsinghua U.
14 years 25 days ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J...
EUROPAR
2009
Springer
14 years 2 months ago
PSPIKE: A Parallel Hybrid Sparse Linear System Solver
The availability of large-scale computing platforms comprised of tens of thousands of multicore processors motivates the need for the next generation of highly scalable sparse line...
Murat Manguoglu, Ahmed H. Sameh, Olaf Schenk