Sciweavers

2932 search results - page 39 / 587
» Optimizing Memory System Performance for Communication in Pa...
Sort
View
HPCA
2002
IEEE
14 years 26 days ago
User-Level Communication in Cluster-Based Servers
Clusters of commodity computers are currently being used to provide the scalability required by severalpopular Internet services. In this paper we evaluate an efficient cluster-b...
Enrique V. Carrera, Srinath Rao, Liviu Iftode, Ric...
IEEEPACT
2005
IEEE
14 years 1 months ago
HUNTing the Overlap
Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlappi...
Costin Iancu, Parry Husbands, Paul Hargrove
EUROPAR
2009
Springer
14 years 16 days ago
Two-Dimensional Matrix Partitioning for Parallel Computing on Heterogeneous Processors Based on Their Functional Performance Mod
Abstract. The functional performance model (FPM) of heterogeneous processors has proven to be more realistic than the traditional models because it integrates many important featur...
Alexey L. Lastovetsky, Ravi Reddy
PLDI
2010
ACM
14 years 1 months ago
A GPGPU compiler for memory optimization and parallelism management
This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performa...
Yi Yang, Ping Xiang, Jingfei Kong, Huiyang Zhou
USENIX
1994
13 years 9 months ago
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems
TreadMarks is a distributed shared memory DSM system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultr...
Peter J. Keleher, Alan L. Cox, Sandhya Dwarkadas, ...