Sciweavers

1998 search results - page 81 / 400
» A Hardware Implementation of PRAM and Its Performance Evalua...
Sort
View
CLUSTER
2011
IEEE
12 years 9 months ago
Performance Characterization and Optimization of Atomic Operations on AMD GPUs
—Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate executi...
Marwa Elteir, Heshan Lin, Wu-chun Feng
EUROPAR
2000
Springer
14 years 23 days ago
Use of Performance Technology for the Management of Distributed Systems
This paper describes a toolset, PACE, that provides detailed predictive performance information throughout the implementation and execution stages of an application. It is structur...
Darren J. Kerbyson, John S. Harper, Efstathios Pap...
PDP
2009
IEEE
14 years 4 months ago
High Throughput Intra-Node MPI Communication with Open-MX
Abstract—The increasing number of cores per node in highperformance computing requires an efficient intra-node MPI communication subsystem. Most existing MPI implementations rel...
Brice Goglin
ISLPED
1999
ACM
150views Hardware» more  ISLPED 1999»
14 years 1 months ago
Using dynamic cache management techniques to reduce energy in a high-performance processor
In this paper, we propose a technique that uses an additional mini cache, the L0-Cache, located between the instruction cache I-Cache and the CPU core. This mechanism can provid...
Nikolaos Bellas, Ibrahim N. Hajj, Constantine D. P...
IPPS
2002
IEEE
14 years 2 months ago
Efficient Pipelining of Nested Loops: Unroll-and-Squash
The size and complexity of current custom VLSI have forced the use of high-level programming languages to describe hardware, and compiler and synthesis technology bstract designs ...
Darin Petkov, Randolph E. Harr, Saman P. Amarasing...