Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchies. Instead of operating on entire rows or columns of an array, blocked algorith...
Monica S. Lam, Edward E. Rothberg, Michael E. Wolf
Conventional cache models are not suited for real-time parallel processing because tasks may flush each other’s data out of the cache in an unpredictable manner. In this way th...
Anca Mariana Molnos, Marc J. M. Heijligers, Sorin ...
We present a cache oblivious algorithm for stencil computations, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in n-dimension...
The memory hierarchy of a system can consume up to 50% of microprocessor system power. Previous work has shown that tuning a configurable cache to a particular application can red...
The ever increasing sizes of on-chip caches and the growing domination of wire delay necessitate significant changes to cache hierarchy design methodologies. Many recent proposal...