We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, whi...
The performance of irregular applications on modern computer systems is hurt by the wide gap between CPU and memory speeds because these applications typically underutilize multi-...
John M. Mellor-Crummey, David B. Whalley, Ken Kenn...
Abstract--One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cacheaware (CA) and cache-oblivious (CO) algorithms take into...
Sac is a functional array processing language particularly designed with numerical applications in mind. In this field the runtime performance of programs critically depends on the...
This paper gives tight bounds on the cost of cache-oblivious searching. The paper shows that no cache-oblivious search structure can guarantee a search performance of fewer than l...