There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in ...
Andrew Pavlo, Erik Paulson, Alexander Rasin, Danie...
Abstract. Prefetching transfers a data item in advance from its storage location to its usage location so that communication is hidden and does not delay computation. We present a ...
Michael Klemm, Jean Christophe Beyler, Ronny T. La...
Despite a burgeoning demand for parallel programs, the tools available to developers working on shared-memory multicore processors have lagged behind. One reason for this is the l...
Marek Olszewski, Qin Zhao, David Koh, Jason Ansel,...
Using Linux for high-performance applications on the compute nodes of IBM Blue Gene/P is challenging because of TLB misses and difficulties with programming the network DMA engine...
Kazutomo Yoshii, Kamil Iskra, Harish Naik, Pete Be...
As the gap between processor and memory continues to grow, memory performance becomes a key performance bottleneck for many applications. Compilers therefore increasingly seek to m...
Sandya S. Mannarswamy, Ramaswamy Govindarajan, Ris...