Large–scale parallel applications performing global synchronization may spend a significant amount of execution time waiting for the completion of a barrier operation. Conseque...
This paper presents the design and implementation of an asynchronous data-staging strategy for file accesses based on ROMIO, the most popular MPI-IO distribution, and ZeptoOS, an ...
It is an important problem to map virtual parallel processes to physical processors (or cores) in an optimized way to get scalable performance due to non-uniform communication cost...
Jin Zhang, Jidong Zhai, Wenguang Chen, Weimin Zhen...
Scheduling parallel jobs has been an active investigation area. The scheduler has to deal with heterogeneous workloads and try to obtain throughputs and response times such that en...
—Achieving high-performance message passing on top of generic ETHERNET hardware suffers from the NIC interruptdriven model where coalescing is usually involved. We present an in-...