Sciweavers

442 search results - page 54 / 89
» Parallel programming over ChinaGrid
Sort
View
PPOPP
2009
ACM
14 years 10 months ago
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
SIGMOD
2009
ACM
136views Database» more  SIGMOD 2009»
14 years 10 months ago
A comparison of approaches to large-scale data analysis
There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in ...
Andrew Pavlo, Erik Paulson, Alexander Rasin, Danie...
SIGMOD
2010
ACM
207views Database» more  SIGMOD 2010»
14 years 2 months ago
Automatic contention detection and amelioration for data-intensive operations
To take full advantage of the parallelism offered by a multicore machine, one must write parallel code. Writing parallel code is difficult. Even when one writes correct code, the...
John Cieslewicz, Kenneth A. Ross, Kyoho Satsumi, Y...
VLSISP
2008
173views more  VLSISP 2008»
13 years 9 months ago
Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors
Advanced bit manipulation operations are not efficiently supported by commodity word-oriented microprocessors. Programming tricks are typically devised to shorten the long sequence...
Yedidya Hilewitz, Ruby B. Lee
EUROPAR
2005
Springer
14 years 3 months ago
PerfMiner: Cluster-Wide Collection, Storage and Presentation of Application Level Hardware Performance Data
Abstract. We present PerfMiner, a system for the transparent collection, storage and presentation of thread-level hardware performance data across an entire cluster. Every sub-proc...
Philip Mucci, Daniel Ahlin, Johan Danielsson, Per ...