Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increa...
Abhinav Bhatele, Eric J. Bohm, Laxmikant V. Kal&ea...
Recently many large scale computer systems are built in order to meet the high storage and processing demands of compute and data-intensive applications. MapReduce is one of the mo...
Porting on grids complex MPI applications involving collective communications requires significant program modification, usually dedicated to a single grid structure. The diffi...
Abstract. In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient,...
It is an important problem to map virtual parallel processes to physical processors (or cores) in an optimized way to get scalable performance due to non-uniform communication cost...
Jin Zhang, Jidong Zhai, Wenguang Chen, Weimin Zhen...