Sciweavers

1933 search results - page 279 / 387
» High-performance computing using accelerators
Sort
View
HPCA
2000
IEEE
14 years 1 months ago
Improving the Throughput of Synchronization by Insertion of Delays
Efficiency of synchronization mechanisms can limit the parallel performance of many shared-memory applications. In addition, the ever increasing performance gap between processor...
Ravi Rajwar, Alain Kägi, James R. Goodman
HPCA
1998
IEEE
14 years 1 months ago
Comparative Evaluation of Latency Tolerance Techniques for Software Distributed Shared Memory
A key challenge in achieving high performance on software DSM systems is overcoming their relatively large communication latencies. In this paper, we consider two techniques which...
Todd C. Mowy, Charles Q. C. Chan, Adley K. W. Lo
PPOPP
1997
ACM
14 years 1 months ago
Compiling Dynamic Mappings with Array Copies
Array remappings are useful to many applications on distributed memory parallel machines. They are available in High Performance Fortran, a Fortran-based data-parallel language. T...
Fabien Coelho
ICS
1989
Tsinghua U.
14 years 1 months ago
Control flow optimization for supercomputer scalar processing
Control intensive scalar programs pose a very different challenge to highly pipelined supercomputers than vectorizable numeric applications. Function call/return and branch instru...
Pohua P. Chang, Wen-mei W. Hwu
DAC
2005
ACM
13 years 11 months ago
Keeping hot chips cool
With 90nm CMOS in production and 65nm testing in progress, power has been pushed to the forefront of design metrics. This paper will outline practical techniques that are used to ...
Ruchir Puri, Leon Stok, Subhrajit Bhattacharya