We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...
The real-time image forming in future, high-end synthetic aperture radar systems is an example of an application that puts new demands on computer architectures. The initial quest...
Anders Ahlander, H. Hellsten, K. Lind, J. Lindgren...
Good spatial locality alleviates both the latency and bandwidth problem of memory by boosting the effect of prefetching and improving the utilization of cache. However, convention...
Xiaoming Gu, Ian Christopher, Tongxin Bai, Chengli...
The growing processor/memory performance gap causes the performance of many codes to be limited by memory accesses. If known to exist in an application, strided memory accesses fo...
Tushar Mohan, Bronis R. de Supinski, Sally A. McKe...
Many parallel systems offer a simple view of memory: all storage cells are addresseduniformly. Despite a uniform view of the memory, the machines differsignificantly in theirmemo...