Performance monitoring of large scale parallel computers creates a dilemma: we need to collect detailed information to find performance bottlenecks, yet collecting all this data ...
Abstract. In this paper, we describe DyRecT (Dynamic Reconfiguration Toolkit) a software library that allows programmers to develop adaptively parallel message-passing MPI program...
Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued perform...
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
We consider the problem of maximizing the reliability of a series-parallel system given cost and weight constraints on the system. The number of components in each subsystem and th...