Sciweavers

164 search results - page 8 / 33
» Data distribution for dense factorization on computers with ...
Sort
View
EUROPAR
2010
Springer
13 years 8 months ago
Maestro: Data Orchestration and Tuning for OpenCL Devices
Abstract. As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel program...
Kyle Spafford, Jeremy S. Meredith, Jeffrey S. Vett...
IPPS
2006
IEEE
14 years 1 months ago
Online strategies for high-performance power-aware thread execution on emerging multiprocessors
Granularity control is an effective means for trading power consumption with performance on dense shared memory multiprocessors, such as multi-SMT and multi-CMP systems. In this p...
Matthew Curtis-Maury, James Dzierwa, Christos D. A...
IPPS
2007
IEEE
14 years 1 months ago
A global address space framework for locality aware scheduling of block-sparse computations
In this paper, we present a mechanism for automatic management of the memory hierarchy, including secondary storage, in the context of a global address space parallel programming ...
Sriram Krishnamoorthy, Ümit V. Çataly&...
IPPS
1998
IEEE
13 years 11 months ago
Compiler-Optimization of Implicit Reductions for Distributed Memory Multiprocessors
This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
Bo Lu, John M. Mellor-Crummey
EUROPAR
2007
Springer
14 years 1 months ago
Toward Scalable Matrix Multiply on Multithreaded Architectures
We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for s...
Bryan Marker, Field G. Van Zee, Kazushige Goto, Gr...