We present a family of methods for speeding up distributed locks by exploiting the uneven distribution of both temporal and spatial locality of access behaviour of many applicatio...
Abstract--The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of moder...
This paper introduces a compiler-orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. ...
Rodric M. Rabbah, Hariharan Sandanagobalane, Mongk...
Communication latencies within critical sections constitute a major bottleneck in some classes of emerging parallel workloads. In this paper, we argue for the use of Inferentially...
This paper explores a new technique called coherence decoupling, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol ...