Sciweavers

3379 search results - page 229 / 676
» Parallel cross-entropy optimization
Sort
View
EUROPAR
2010
Springer
15 years 2 months ago
Optimized On-Chip-Pipelined Mergesort on the Cell/B.E
Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...
Rikard Hultén, Christoph W. Kessler, Jö...
159
Voted
PPOPP
2009
ACM
16 years 3 months ago
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
CGO
2003
IEEE
15 years 7 months ago
Optimizing Memory Accesses For Spatial Computation
In this paper we present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-b...
Mihai Budiu, Seth Copen Goldstein
ASPLOS
2008
ACM
15 years 4 months ago
Communication optimizations for global multi-threaded instruction scheduling
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several lim...
Guilherme Ottoni, David I. August
EUROPAR
2009
Springer
15 years 9 months ago
A Case Study of Communication Optimizations on 3D Mesh Interconnects
Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increa...
Abhinav Bhatele, Eric J. Bohm, Laxmikant V. Kal&ea...