GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in ...
Andrew Pavlo, Erik Paulson, Alexander Rasin, Danie...
To take full advantage of the parallelism offered by a multicore machine, one must write parallel code. Writing parallel code is difficult. Even when one writes correct code, the...
John Cieslewicz, Kenneth A. Ross, Kyoho Satsumi, Y...
Advanced bit manipulation operations are not efficiently supported by commodity word-oriented microprocessors. Programming tricks are typically devised to shorten the long sequence...
Abstract. We present PerfMiner, a system for the transparent collection, storage and presentation of thread-level hardware performance data across an entire cluster. Every sub-proc...
Philip Mucci, Daniel Ahlin, Johan Danielsson, Per ...