Sciweavers

839 search results - page 4 / 168
» Communication Optimizations for Parallel Computing Using Dat...
Sort
View
ASPLOS
2009
ACM
14 years 8 months ago
3D finite difference computation on GPUs using CUDA
In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implement...
Paulius Micikevicius
PPOPP
1995
ACM
13 years 11 months ago
A Model and Compilation Strategy for Out-of-Core Data Parallel Programs
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We prese...
Rajesh Bordawekar, Alok N. Choudhary, Ken Kennedy,...
ICDCS
2000
IEEE
13 years 12 months ago
Static and Adaptive Data Replication Algorithms for Fast Information Access in Large Distributed Systems
Creating replicas of frequently accessed objects across a read-intensive network can result in large bandwidth savings which, in turn, can lead to reduction in user response time....
Thanasis Loukopoulos, Ishfaq Ahmad
NPC
2005
Springer
14 years 29 days ago
Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64
This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to op...
Yanwei Niu, Ziang Hu, Kenneth E. Barner, Guang R. ...
ICS
2001
Tsinghua U.
13 years 12 months ago
Global optimization techniques for automatic parallelization of hybrid applications
This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our sche...
Dhruva R. Chakrabarti, Prithviraj Banerjee