In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...
This paper presents a parallel framework for simulating fluids with the Smoothed Particle Hydrodynamics (SPH) method. For low computational costs per simulation step, efficient ...
Markus Ihmsen, Nadir Akinci, Markus Becker, Matthi...
—Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architec...
Vasileios Karakasis, Georgios I. Goumas, Nectarios...
Loop fusion is important to optimizing compilers because it is an important tool in managing the memory hierarchy. By fusing loops that use the same data elements, we can reduce t...