Sciweavers

216 search results - page 35 / 44
» On High-Bandwidth Data Cache Design for Multi-Issue Processo...
Sort
View
CLUSTER
2006
IEEE
14 years 1 months ago
Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters
As new processor and memory architectures advance, clusters start to be built from larger SMP systems, which makes MPI intra-node communication a critical issue in high performanc...
Lei Chai, Albert Hartono, Dhabaleswar K. Panda
ICS
2010
Tsinghua U.
13 years 9 months ago
Timing local streams: improving timeliness in data prefetching
Data prefetching technique is widely used to bridge the growing performance gap between processor and memory. Numerous prefetching techniques have been proposed to exploit data pa...
Huaiyu Zhu, Yong Chen, Xian-He Sun
NOCS
2007
IEEE
14 years 2 months ago
The Power of Priority: NoC Based Distributed Cache Coherency
The paper introduces Network-on-Chip (NoC) design methodology and low cost mechanisms for supporting efficient cache access and cache coherency in future high-performance Chip Mul...
Evgeny Bolotin, Zvika Guz, Israel Cidon, Ran Ginos...
HPCA
1998
IEEE
14 years 2 days ago
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors
Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many imp...
Ye Zhang, Lawrence Rauchwerger, Josep Torrellas
MICRO
2005
IEEE
110views Hardware» more  MICRO 2005»
14 years 1 months ago
Scalable Store-Load Forwarding via Store Queue Index Prediction
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
Tingting Sha, Milo M. K. Martin, Amir Roth