Sciweavers

947 search results - page 12 / 190
» Algorithms for Distributed Fault Management in Telecommunica...
Sort
View
SASO
2007
IEEE
14 years 1 months ago
e-SAFE: An Extensible, Secure and Fault Tolerant Storage System
With the rapidly falling price of hardware, and increasingly available bandwidth, the storage technology is seeing a paradigm shift from centralized and managed mode to distribute...
Sandip Agarwala, Arnab Paul, Umakishore Ramachandr...
SAC
2006
ACM
13 years 7 months ago
Combining supervised and unsupervised monitoring for fault detection in distributed computing systems
Fast and accurate fault detection is becoming an essential component of management software for mission critical systems. A good fault detector makes possible to initiate repair a...
Haifeng Chen, Guofei Jiang, Cristian Ungureanu, Ke...
USENIX
2008
13 years 9 months ago
Improving Scalability and Fault Tolerance in an Application Management Infrastructure
This paper explores the challenges associated with distributed application management in large-scale computing environments. In particular, we investigate several techniques for e...
Nikolay Topilski, Jeannie R. Albrecht, Amin Vahdat
IPPS
2000
IEEE
13 years 11 months ago
Fault Tolerant Wide-Area Parallel Computing
Executing parallel applications across distributed networks introduces the problem of fault tolerance. A viable solution for fault tolerance must keep overhead manageable and not c...
Jon B. Weissman
DSN
2003
IEEE
14 years 21 days ago
Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to ana...
Péter Urbán, Ilya Shnayderman, Andr&...