Sciweavers

179 search results - page 9 / 36
» A Fault Detection Service for Wide Area Distributed Computat...
Sort
View
154
Voted
IPPS
2010
IEEE
15 years 1 months ago
A general algorithm for detecting faults under the comparison diagnosis model
We develop a widely applicable algorithm to solve the fault diagnosis problem in certain distributed-memory multiprocessor systems in which there are a limited number of faulty pr...
Iain A. Stewart
152
Voted
VEE
2012
ACM
215views Virtualization» more  VEE 2012»
13 years 11 months ago
SecondSite: disaster tolerance as a service
This paper describes the design and implementation of SecondSite, a cloud-based service for disaster tolerance. SecondSite extends the Remus virtualization-based high availability...
Shriram Rajagopalan, Brendan Cully, Ryan O'Connor,...
WWW
2003
ACM
15 years 9 months ago
WS-Membership - Failure Management in a Web-Services World
An important factor in the successful deployment of federated web-services-based business activities will be the ability to guarantee reliable distributed operation and execution....
Werner Vogels, Christopher Ré
137
Voted
GPC
2007
Springer
15 years 10 months ago
Fault Management in P2P-MPI
We present in this paper the recent developments done in P2P-MPI, a grid middleware, concerning the fault management, which covers fault-tolerance for applications and fault detect...
Stéphane Genaud, Choopan Rattanapoka
140
Voted
KDD
2004
ACM
124views Data Mining» more  KDD 2004»
16 years 4 months ago
Eigenspace-based anomaly detection in computer systems
We report on an automated runtime anomaly detection method at the application layer of multi-node computer systems. Although several network management systems are available in th...
Hisashi Kashima, Tsuyoshi Idé