Sciweavers

2932 search results - page 120 / 587
» Optimizing Memory System Performance for Communication in Pa...
Sort
View
CLUSTER
2000
IEEE
13 years 9 months ago
Block-cyclic redistribution over heterogeneous networks
Clusters of workstations and networked parallel computing systems are emerging as promising computational platforms for HPC applications. The processors in such systems are typica...
Prashanth B. Bhat, Viktor K. Prasanna, Cauligi S. ...
ASPLOS
1996
ACM
14 years 1 months ago
Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory
This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compa...
Daniel J. Scales, Kourosh Gharachorloo, Chandramoh...
ASAP
2007
IEEE
123views Hardware» more  ASAP 2007»
14 years 3 months ago
A memcpy Hardware Accelerator Solution for Non Cache-line Aligned Copies
In this paper, we present a hardware solution to perform non cache-line aligned memory copies allowing the commonly used memcpy function to cope with word copies. The main purpose...
Filipa Duarte, Stephan Wong
CLUSTER
2011
IEEE
12 years 9 months ago
Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications
—Computing systems will grow significantly larger in the near future to satisfy the needs of computational scientists in areas like climate modeling, biophysics and cosmology. S...
Esteban Meneses, Laxmikant V. Kalé, Greg Br...
ICPP
1999
IEEE
14 years 1 months ago
A Framework for Interprocedural Locality Optimization Using Both Loop and Data Layout Transformations
There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However,...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...