Sciweavers

2932 search results - page 48 / 587
» Optimizing Memory System Performance for Communication in Pa...
Sort
View
SPAA
1992
ACM
14 years 7 hour ago
Subset Barrier Synchronization on a Private-Memory Parallel System
A global barrier synchronizes all processors in a parallel system. This paper investigates algorithms that allow disjoint subsets of processors to synchronize independently and in...
Anja Feldmann, Thomas R. Gross, David R. O'Hallaro...
IPPS
1997
IEEE
13 years 11 months ago
A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and...
J. Choi
HPCA
2003
IEEE
14 years 8 months ago
Dynamic Voltage Scaling with Links for Power Optimization of Interconnection Networks
Originally developed to connect processors and memories in multicomputers, prior research and design of interconnection networks have focused largely on performance. As these netw...
Li Shang, Li-Shiuan Peh, Niraj K. Jha
ASPLOS
2010
ACM
14 years 23 days ago
An asymmetric distributed shared memory model for heterogeneous parallel systems
Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. Existin...
Isaac Gelado, Javier Cabezas, Nacho Navarro, John ...
ASPLOS
1996
ACM
14 years 3 days ago
An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System
On a distributed memory machine, hand-coded message passing leads to the most efficient execution, but it is difficult to use. Parallelizing compilers can approach the performance...
Sandhya Dwarkadas, Alan L. Cox, Willy Zwaenepoel