We present a scalable temporal order analysis technique that supports debugging of large scale applications by classifying MPI tasks based on their logical program execution order...
Dong H. Ahn, Bronis R. de Supinski, Ignacio Laguna...
Algorithmic choice is essential in any problem domain to realizing optimal computational performance. Multigrid is a prime example: not only is it possible to make choices at the ...
Cy P. Chan, Jason Ansel, Yee Lok Wong, Saman P. Am...
Proponents of utility-based scheduling policies have shown the potential for a 100–1400% increase in value-delivered to users when used in lieu of traditional approaches such as...
Existing cache partitioning schemes are designed in a manner oblivious to the implicit processor partitioning enforced by the operating system. This paper examines an operating sy...
Shekhar Srikantaiah, Reetuparna Das, Asit K. Mishr...
Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations enc...
In this paper, we present an early performance evaluation of a 624-core cluster based on the Intel® Xeon® Processor 5560 (code named “Nehalem-EP”, and referred to as Xeon 55...
We present ECC FIFO, a mechanism enabling two-tiered last-level cache error protection using an arbitrarily strong tier-2 code without increasing on-chip storage. Instead of addin...