Dining philosophers is a classic scheduling problem for local mutual exclusion on arbitrary conflict graphs. We establish necessary conditions to solve wait-free dining under even...
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Replication is a key technique for improving fault tolerance. Replication can also improve application performance under some circumstances, but can have the opposite effect under...
In the past decade cluster computing platforms have been widely applied to support a variety of scientific and commercial applications, many of which are parallel in nature. Howev...
CCS is a resource management system for parallel high-performance computers. At the user level, CCS provides vendor-independent access to parallel systems. At the system administr...