Sciweavers

41 search results - page 3 / 9
» Exploiting program cyclic behavior to reduce memory latency ...
Sort
View
HPCA
2006
IEEE
16 years 4 months ago
A decoupled KILO-instruction processor
Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elu...
Miquel Pericàs, Adrián Cristal, Rube...
CASES
2003
ACM
15 years 9 months ago
Exploiting bank locality in multi-bank memories
Bank locality can be defined as localizing the number of load/store accesses to a small set of memory banks at a given time. An optimizing compiler can modify a given input code t...
Guilin Chen, Mahmut T. Kandemir, Hendra Saputra, M...
ECRTS
2007
IEEE
15 years 10 months ago
WCET-Directed Dynamic Scratchpad Memory Allocation of Data
Many embedded systems feature processors coupled with a small and fast scratchpad memory. To the difference with caches, allocation of data to scratchpad memory must be handled by...
Jean-François Deverge, Isabelle Puaut
EUROPAR
2009
Springer
15 years 8 months ago
Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades
The Cell Broadband Engine (Cell BE) is a heterogeneous multi-core processor specifically designed to exploit thread-level parallelism. Its memory model comprehends a common shared ...
Epifanio Gaona, Juan Fernández, Manuel E. A...
ASPDAC
2004
ACM
107views Hardware» more  ASPDAC 2004»
15 years 9 months ago
Fast, predictable and low energy memory references through architecture-aware compilation
The design of future high-performance embedded systems is hampered by two problems: First, the required hardware needs more energy than is available from batteries. Second, curren...
Peter Marwedel, Lars Wehmeyer, Manish Verma, Stefa...