Sparse matrix operations achieve only small fractions of peak CPU speeds because of the use of specialized, indexbased matrix representations, which degrade cache utilization by i...
Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of ...
Most Hardware Transactional Memory (HTM) implementations choose fixed version and conflict management policies at design time. While eager HTM systems store transactional state in-...
Marc Lupon, Grigorios Magklis, Antonio Gonzá...
In this paper we describe our experience with Teapot [7], a domain-specific language for writing cache coherence protocols. Cache coherence is of concern when parallel and distrib...
Satish Chandra, James R. Larus, Michael Dahlin, Br...
This paper presents CMP Cooperative Caching, a unified framework to manage a CMP’s aggregate on-chip cache resources. Cooperative caching combines the strengths of private and ...