High-performance execution in distributed computing environments often requires careful selection and configuration not only of computers, networks, and other resources but also o...
Steven Fitzgerald, Ian T. Foster, Carl Kesselman, ...
Applications such as parallel computing, online games, and content distribution networks need to run on a set of resources with particular network connection characteristics to ge...
Scientific instruments, such as radio telescopes, colliders, sensor networks, and simulators generate very high volumes of data streams that scientists analyze to detect and under...
—Understanding the communication behavior and network resource usage of parallel applications is critical to achieving high performance and scalability on systems with tens of th...
Ron Brightwell, Kevin T. Pedretti, Kurt B. Ferreir...
: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...