Sciweavers

141 search results - page 24 / 29
» Load Execution Latency Reduction
Sort
View
ICPP
1999
IEEE
14 years 26 days ago
Producer-Push - A Protocol Enhancement to Page-Based Software Distributed Shared Memory Systems
This paper describes a technique called producer-push that enhances the performance of a page-based software distributed shared memory system. Shared data, in software DSM systems...
Sven Karlsson, Mats Brorsson
MICRO
1997
IEEE
128views Hardware» more  MICRO 1997»
14 years 24 days ago
Run-Time Spatial Locality Detection and Optimization
As the disparity between processor and main memory performance grows, the number of execution cycles spent waiting for memory accesses to complete also increases. As a result, lat...
Teresa L. Johnson, Matthew C. Merten, Wen-mei W. H...
ISCA
1995
IEEE
147views Hardware» more  ISCA 1995»
14 years 4 days ago
Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors
This paper introduces dynamic self-invalidation (DSI), a new technique for reducing cache coherence overhead in shared-memory multiprocessors. DSI eliminates invalidation messages...
Alvin R. Lebeck, David A. Wood
ISCA
1995
IEEE
120views Hardware» more  ISCA 1995»
14 years 4 days ago
Streamlining Data Cache Access with Fast Address Calculation
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...
EUROPAR
2009
Springer
14 years 3 months ago
Adaptive Parallel Householder Bidiagonalization
With the increasing use of large image and video archives and high-resolution multimedia data streams in many of today’s research and application areas, there is a growing need f...
Fangbin Liu, Frank J. Seinstra