Sciweavers

215 search results - page 33 / 43
» Optimization Techniques for Parallel Codes of Irregular Scie...
Sort
View
IPPS
1997
IEEE
14 years 19 days ago
The Sparse Cyclic Distribution against its Dense Counterparts
Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real a...
Gerardo Bandera, Manuel Ujaldon, María A. T...
ICS
1995
Tsinghua U.
13 years 12 months ago
Optimum Modulo Schedules for Minimum Register Requirements
Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirement...
Alexandre E. Eichenberger, Edward S. Davidson, San...
EUROPAR
2004
Springer
14 years 1 months ago
Exploiting Spatial Store Locality Through Permission Caching in Software DSMs
Abstract. Fine-grained software-based distributed shared memory (SWDSM) systems typically maintain coherence with in-line checking code at load and store operations to shared memor...
Håkan Zeffer, Zoran Radovic, Oskar Grenholm,...
PC
2007
161views Management» more  PC 2007»
13 years 8 months ago
High performance combinatorial algorithm design on the Cell Broadband Engine processor
The Sony–Toshiba–IBM Cell Broadband Engine (Cell/B.E.) is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-process...
David A. Bader, Virat Agarwal, Kamesh Madduri, Seu...
EUROPAR
2009
Springer
14 years 3 months ago
A Case Study of Communication Optimizations on 3D Mesh Interconnects
Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increa...
Abhinav Bhatele, Eric J. Bohm, Laxmikant V. Kal&ea...