Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's executio...
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Long-latency loads are critical in today's processors due to the ever-increasing speed gap with memory. Not only do these loads block the execution of dependent instructions,...
distributed shared-memory (SDSM) provides the abstraction necessary to run shared-memory applications on cost-effective parallel platforms such as clusters of workstations. Howeve...
The progress towards programming methodologies that simplify the work of the programmer involves automating, whenever possible, activities that are secondary to the main task of d...