Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used d...
Ilias Iliadis, Robert Haas, Xiao-Yu Hu, Evangelos ...
Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource o...
—Distributed file systems that use multiple servers to store data in parallel are becoming commonplace. Much work has already gone into such systems to maximize data throughput....
Nawab Ali, Ananth Devulapalli, Dennis Dalessandro,...
Software testing to produce reliable and robust software has become vitally important in recent years. Testing is a process by which software quality can be assured through the co...
Jonathan Misurda, James A. Clause, Juliya L. Reed,...
A common pattern in scientific computing involves the execution of many tasks that are coupled only in the sense that the output of one may be passed as input to one or more other...
Yong Zhao, Mihael Hategan, Ben Clifford, Ian T. Fo...