Abstract—The increasing number of cores per node in highperformance computing requires an efficient intra-node MPI communication subsystem. Most existing MPI implementations rel...
—Analytical models have been used to estimate optimal values for parameters such as tile sizes in the context of loop nests. However, important algorithms such as fast Fourier tr...
Basilio B. Fraguela, Yevgen Voronenko, Markus P&uu...
As semiconductor processing technology continues to scale down, managing reliability becomes an increasingly difficult challenge in high-performance microprocessor design. Transie...
—Remote atomic memory operations are critical for achieving high-performance synchronization in tightly-coupled systems. Previous approaches to implementing atomic memory operati...
Keith D. Underwood, Michael Levenhagen, K. Scott H...
Data replication is an excellent technique to move and cache data close to users. By replication, data access performance can be improved dramatically. One of the challenges in da...