Sciweavers

2932 search results - page 27 / 587
» Optimizing Memory System Performance for Communication in Pa...
Sort
View
IPPS
1998
IEEE
14 years 4 days ago
Vector Prefix and Reduction Computation on Coarse-Grained, Distributed-Memory Parallel Machines
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split ...
Seungjo Bae, Dongmin Kim, Sanjay Ranka
EUROPAR
2009
Springer
13 years 11 months ago
Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades
The Cell Broadband Engine (Cell BE) is a heterogeneous multi-core processor specifically designed to exploit thread-level parallelism. Its memory model comprehends a common shared ...
Epifanio Gaona, Juan Fernández, Manuel E. A...
CF
2010
ACM
14 years 29 days ago
Exposing parallelism and locality in a runtime parallel optimization framework
Runtime parallel optimization has been suggested as a means to overcome the difficulties of parallel programming. For runtime parallel optimization to be effective, parallelism a...
David A. Penry, Daniel J. Richins, Tyler S. Harris...
HPCA
2002
IEEE
14 years 25 days ago
Fine-Grain Priority Scheduling on Multi-Channel Memory Systems
Configurations of contemporary DRAM memory systems become increasingly complex. A recent study [5] shows that application performance is highly sensitive to choices of configura...
Zhichun Zhu, Zhao Zhang, Xiaodong Zhang
ICPPW
2002
IEEE
14 years 25 days ago
SNOW: Software Systems for Process Migration in High-Performance, Heterogeneous Distributed Environments
This paper reports our experiences on the Scalable Network Of Workstation (SNOW) project, which implements a novel methodology to support user-level process migration for traditio...
Kasidit Chanchio, Xian-He Sun