DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make ti...
Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses. Loop tiling, in conjunction with other program tran...
In this paper, we investigate the power implications of tile size selection for tile-based processors. We refer to this investigation as a tile granularity study. This is accompli...
John Oliver, Ravishankar Rao, Michael Brown, Jenni...