A cache oblivious matrix transposition algorithm is implemented and analyzed using simulation and hardware performance counters. Contrary to its name, the cache oblivious matrix tr...
D. Tsifakis, Alistair P. Rendell, Peter E. Strazdi...
—In this paper, we consider the design of caching infrastructure to enhance the client-perceived performance of mobile wireless clients retrieving multimedia objects from the Int...
Hazem Gomaa, Geoffrey G. Messier, Robert J. Davies...
Current data cache organizations fail to deliver high performance in scalar processors for many vector applications. There are two main reasons for this loss of performance: the u...
This paper makes the case for the use of XOR-based placement functions for cache memories. It shows that these XOR-mapping schemes can eliminate many conflict misses for direct-ma...
Modern chip-level multiprocessors (CMPs) contain multiple processor cores sharing a common last-level cache, memory interconnects, and other hardware resources. Workloads running ...
Richard West, Puneet Zaroo, Carl A. Waldspurger, X...