Untolerated load instruction latencies often have a significant impact on overall program performance. As one means of mitigating this effect, we present an aggressive hardware-b...
This paper presents a multiprocessor system on FPGA that adopts Direct Memory Access (DMA) mechanisms to move data between the external memory and the local memory of each process...
Mutual exclusion locks limit concurrency but offer low latency. Software transactional memory (STM) typically has higher latency, but scales well. In this paper we propose transac...
Luke Dalessandro, David Dice, Michael L. Scott, Ni...
As peer-to-peer systems are evolving from simplistic application specific overlays to middleware platforms hosting a range of potential applications it has become evident that incr...
Gareth Tyson, Paul Grace, Andreas Mauthe, Gordon S...
As the number of I/O-intensive MPI programs becomes increasingly large, many efforts have been made to improve I/O performance, on both software and architecture sides. On the sof...