Sciweavers

923 search results - page 114 / 185
» Shared Memory Performance Profiling
Sort
View
IPPS
2006
IEEE
15 years 8 months ago
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
Due to shared cache contentions and interconnect delays, data prefetching is more critical in alleviating penalties from increasing memory latencies and demands on Chip-Multiproce...
Xudong Shi, Zhen Yang, Jih-Kwon Peir, Lu Peng, Yen...
131
Voted
PDP
2010
IEEE
15 years 6 months ago
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
The increasing numbers of cores, shared caches and memory nodes within machines introduces a complex hardware topology. High-performance computing applications now have to carefull...
François Broquedis, Jérôme Cle...
166
Voted
IPPS
1997
IEEE
15 years 6 months ago
DPF: A Data Parallel Fortran Benchmark Suite
We present the Data Parallel Fortran (DPF) benchmark suite, a set of data parallel Fortran codes forevaluatingdata parallel compilers appropriatefor any target parallel architectu...
Y. Charlie Hu, S. Lennart Johnsson, Dimitris Kehag...
HPCA
2012
IEEE
13 years 9 months ago
Decoupled dynamic cache segmentation
The least recently used (LRU) replacement policy performs poorly in the last-level cache (LLC) because temporal locality of memory accesses is filtered by first and second level...
Samira Manabi Khan, Zhe Wang, Daniel A. Jimé...
CF
2006
ACM
15 years 6 months ago
An efficient cache design for scalable glueless shared-memory multiprocessors
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the ...
Alberto Ros, Manuel E. Acacio, José M. Garc...