Sciweavers

48 search results - page 3 / 10
» Tolerating data access latency with register preloading
Sort
View
HPCC
2009
Springer
13 years 11 months ago
On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors
Usual cache optimisation techniques for high performance computing are difficult to apply in embedded VLIW applications. First, embedded applications are not always well structur...
Samir Ammenouche, Sid Ahmed Ali Touati, William Ja...
SIAMCOMP
2010
105views more  SIAMCOMP 2010»
13 years 1 months ago
Fast Access to Distributed Atomic Memory
We study efficient and robust implementations of an atomic read-write data structure over an asynchronous distributed message-passing system made of reader and writer processes, as...
Partha Dutta, Rachid Guerraoui, Ron R. Levy, Marko...
MICRO
1995
IEEE
102views Hardware» more  MICRO 1995»
13 years 10 months ago
Zero-cycle loads: microarchitecture support for reducing load latency
Untolerated load instruction latencies often have a significant impact on overall program performance. As one means of mitigating this effect, we present an aggressive hardware-b...
Todd M. Austin, Gurindar S. Sohi
PPL
2006
81views more  PPL 2006»
13 years 7 months ago
Microthreading a Model for Distributed Instruction-level Concurrency
This paper analyses the micro-threaded model of concurrency making comparisons with both data and instruction-level concurrency. The model is fine grain and provides synchronisati...
Chris R. Jesshope
ICPP
1994
IEEE
13 years 11 months ago
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors
Both hardware and software prefetching have been shown to be e ective in tolerating the large memory latencies inherent in shared-memory multiprocessors however, both types of pre...
Edward H. Gornish, Alexander V. Veidenbaum