Sciweavers

656 search results - page 106 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
IPPS
2003
IEEE
14 years 1 months ago
Leveraging Block Decisions and Aggregation in the ShareStreams QoS Architecture
ShareStreams (Scalable Hardware Architectures for Stream Schedulers) is a canonical architecture for realizing a range of scheduling disciplines. This paper discusses the design c...
Raj Krishnamurthy, Sudhakar Yalamanchili, Karsten ...
PPOPP
2011
ACM
12 years 11 months ago
GRace: a low-overhead mechanism for detecting data races in GPU programs
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel program...
Mai Zheng, Vignesh T. Ravi, Feng Qin, Gagan Agrawa...
EUROPAR
2006
Springer
14 years 9 days ago
Hierarchical Model Validation of Symbolic Performance Models of Scientific Kernels
Multi-resolution validation of hierarchical performance models of scientific applications is critical primarily for two reasons. First, the step-by-step validation determines the c...
Sadaf R. Alam, Jeffrey S. Vetter
HPCA
2006
IEEE
14 years 9 months ago
DMA-aware memory energy management
As increasingly larger memories are used to bridge the widening gap between processor and disk speeds, main memory energy consumption is becoming increasingly dominant. Even thoug...
Vivek Pandey, Weihang Jiang, Yuanyuan Zhou, Ricard...
IPPS
2003
IEEE
14 years 1 months ago
Extending OpenMP to Support Slipstream Execution Mode
OpenMP has emerged as a widely accepted standard for writing shared memory programs. Hardware-specific extensions such as data placement are usually needed to improve the scalabi...
Khaled Z. Ibrahim, Gregory T. Byrd