Sciweavers

80 search results - page 4 / 16
» Impact of Failure on Interconnection Networks for Large Stor...
Sort
View
DSN
2009
IEEE
13 years 11 months ago
Evaluating the impact of Undetected Disk Errors in RAID systems
Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, Undetected Disk Errors (UDEs) also known as silent data corruption events, b...
Eric Rozier, Wendy Belluomini, Veera Deenadhayalan...
FMCO
2007
Springer
129views Formal Methods» more  FMCO 2007»
14 years 1 months ago
Self Management for Large-Scale Distributed Systems: An Overview of the SELFMAN Project
As Internet applications become larger and more complex, the task of managing them becomes overwhelming. “Abnormal” events such as software updates, failures, attacks, and hots...
Peter Van Roy, Seif Haridi, Alexander Reinefeld, J...
HPDC
2009
IEEE
14 years 2 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
MSS
2007
IEEE
153views Hardware» more  MSS 2007»
14 years 1 months ago
Hybrid Host/Network Topologies for Massive Storage Clusters
The high demand for large scale storage capacity calls for the availability of massive storage solutions with high performance interconnects. Although cluster file systems are rap...
Asha Andrade, Ungzu Mun, Dong Hwan Chung, Alexande...
SRDS
2006
IEEE
14 years 1 months ago
Topology Sensitive Replica Selection
As the disks typically found in personal computers grow larger, protecting data by replicating it on a collection of “peer” systems rather than on dedicated high performance s...
Dmitry Brodsky, Michael J. Feeley, Norman C. Hutch...