Sciweavers

392 search results - page 54 / 79
» Fault Tolerance in a DSM Cluster Operating System
Sort
View
SRDS
1998
IEEE
15 years 8 months ago
Optimization of a Real-Time Primary-Backup Replication Service
The primary-backup replication model is one of the commonly adopted approaches to providing fault tolerant data services. Its extension to the real-time environment, however, impo...
Hengming Zou, Farnam Jahanian
SOSP
1997
ACM
15 years 5 months ago
Distributed Schedule Management in the Tiger Video Fileserver
Tiger is a scalable, fault-tolerant video file server constructed from a collection of computers connected by a switched network. All content files are striped across all of the c...
William J. Bolosky, Robert P. Fitzgerald, John R. ...
ICPP
2007
IEEE
15 years 10 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...
HCW
1998
IEEE
15 years 8 months ago
CCS Resource Management in Networked HPC Systems
CCS is a resource management system for parallel high-performance computers. At the user level, CCS provides vendor-independent access to parallel systems. At the system administr...
Axel Keller, Alexander Reinefeld
ICECCS
2010
IEEE
159views Hardware» more  ICECCS 2010»
15 years 4 months ago
Towards Self-Healing Swarm Robotic Systems Inspired by Granuloma Formation
Abstract—Granuloma is a medical term for a ball-like collection of immune cells that attempts to remove foreign substances from a host organism. This response is a special type o...
Amelia Ritahani Ismail, Jon Timmis