Sciweavers

1156 search results - page 149 / 232
» Efficient Barriers for Distributed Shared Memory Computers
Sort
View
IPPS
2006
IEEE
14 years 3 months ago
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
Due to shared cache contentions and interconnect delays, data prefetching is more critical in alleviating penalties from increasing memory latencies and demands on Chip-Multiproce...
Xudong Shi, Zhen Yang, Jih-Kwon Peir, Lu Peng, Yen...
ICPADS
2000
IEEE
14 years 1 months ago
A Practical Nonblocking Queue Algorithm Using Compare-and-Swap
Many nonblocking algorithms have been proposed for shared queues. Previous studies indicate that link-based algorithms perform best. However, these algorithms have a memory manage...
Chien-Hua Shann, Ting-Lu Huang, Cheng Chen
SIGGRAPH
2000
ACM
14 years 1 months ago
Pomegranate: a fully scalable graphics architecture
Pomegranate is a parallel hardware architecture for polygon rendering that provides scalable input bandwidth, triangle rate, pixel rate, texture memory and display bandwidth while...
Matthew Eldridge, Homan Igehy, Pat Hanrahan
HPCA
1997
IEEE
14 years 1 months ago
A Performance Comparison of Hierarchical Ring- and Mesh-Connected Multiprocessor Networks
This paper compares the performance of hierarchical ring- and mesh-connected wormhole routed shared memory multiprocessor networks in a simulation study. Hierarchical rings are in...
Govindan Ravindran, Michael Stumm
HPCA
2012
IEEE
12 years 4 months ago
Decoupled dynamic cache segmentation
The least recently used (LRU) replacement policy performs poorly in the last-level cache (LLC) because temporal locality of memory accesses is filtered by first and second level...
Samira Manabi Khan, Zhe Wang, Daniel A. Jimé...