On large distributed memory parallel computers the global communication cost of inner products seriously limits the performance of Krylov subspace methods 3]. We consider improved ...
Synchronization in distributed systems is expensive because, in general, threads must stall to obtain a lock or to operate on volatile data. Transactional memory, on the other hand...
Modern memory systems rely on spatial locality to provide high bandwidth while minimizing memory device power and cost. The trend of increasing the number of cores that share memo...
Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Mike Sul...
— Achieving high performance for out-of-core applications typically involves explicit management of the movement of data between the disk and the physical memory. We are developi...
Sriram Krishnamoorthy, Juan Piernas, Vinod Tippara...
While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In...
Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, Davi...