Abstract. We present new performance models and a new, more compact data structure for cache blocking when applied to the sparse matrixvector multiply (SpM×V) operation, y ← y +...
Rajesh Nishtala, Richard W. Vuduc, James Demmel, K...
This paper presents hardware and software mechanisms to enable concurrent direct network access (CDNA) by operating systems running within a virtual machine monitor. In a conventi...
Jeffrey Shafer, David Carr, Aravind Menon, Scott R...
Recent technological advances have produced network interfaces that provide users with very low-latency access to the memory of remote machines. We examine the impact of such netw...
Leonidas I. Kontothanassis, Galen C. Hunt, Robert ...
Retrieving sequential rich media content from modern commodity disks is a challenging task. As disk capacity increases, there is a need to increase the number of streams that are ...
George Panagiotakis, Michail Flouris, Angelos Bila...
We present a general scheme for virtualizing main memory errorcorrection mechanisms, which map redundant information needed to correct errors into the memory namespace itself. We ...