Sciweavers

141 search results - page 24 / 29
» Load Execution Latency Reduction
Sort
View
ICPP
1999
IEEE
15 years 6 months ago
Producer-Push - A Protocol Enhancement to Page-Based Software Distributed Shared Memory Systems
This paper describes a technique called producer-push that enhances the performance of a page-based software distributed shared memory system. Shared data, in software DSM systems...
Sven Karlsson, Mats Brorsson
116
Voted
MICRO
1997
IEEE
128views Hardware» more  MICRO 1997»
15 years 6 months ago
Run-Time Spatial Locality Detection and Optimization
As the disparity between processor and main memory performance grows, the number of execution cycles spent waiting for memory accesses to complete also increases. As a result, lat...
Teresa L. Johnson, Matthew C. Merten, Wen-mei W. H...
ISCA
1995
IEEE
147views Hardware» more  ISCA 1995»
15 years 6 months ago
Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors
This paper introduces dynamic self-invalidation (DSI), a new technique for reducing cache coherence overhead in shared-memory multiprocessors. DSI eliminates invalidation messages...
Alvin R. Lebeck, David A. Wood
ISCA
1995
IEEE
120views Hardware» more  ISCA 1995»
15 years 6 months ago
Streamlining Data Cache Access with Fast Address Calculation
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...
EUROPAR
2009
Springer
15 years 9 months ago
Adaptive Parallel Householder Bidiagonalization
With the increasing use of large image and video archives and high-resolution multimedia data streams in many of today’s research and application areas, there is a growing need f...
Fangbin Liu, Frank J. Seinstra