Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
Assembly lines with closed loop parallel lanes have the potential to continue to be productive when individual stations breakdown. A requirement in such parallel lane systems is t...
With the widening performance gap between processors and main memory, efficient memory accessing behavior is necessary for good program performance. Loop partition is an effective...
This paper presents design and experimental results of a parallel linear equation solver by asynchronous partial Gauss-Seidel method. The basic idea of this method is derived from ...
Performance prediction across platforms is increasingly important as developers can choose from a wide range of execution platforms. The main challenge remains to perform accurate...