Various architectural power reduction techniques have been proposed for on-chip caches in the last decade. In this paper, we first show that these power reduction techniques can b...
Ja Chun Ku, Serkan Ozdemir, Gokhan Memik, Yehea I....
Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and...
In this paper, we consider alternate ways of storing a sparse matrix and their effect on computational speed. They involve keeping both the indices and the non-zero elements in t...
There are two important hurdles that restrict the scalability of directory-based shared-memory multiprocessors: the directory memory overhead and the long L2 miss latencies due to ...
We present a family of methods for speeding up distributed locks by exploiting the uneven distribution of both temporal and spatial locality of access behaviour of many applicatio...