Sciweavers

135 search results - page 15 / 27
» Code and Data Transformations for Improving Shared Cache Per...
Sort
View
MICRO
2006
IEEE
145views Hardware» more  MICRO 2006»
14 years 1 months ago
ASR: Adaptive Selective Replication for CMP Caches
The large working sets of commercial and scientific workloads stress the L2 caches of Chip Multiprocessors (CMPs). Some CMPs use a shared L2 cache to maximize the on-chip cache c...
Bradford M. Beckmann, Michael R. Marty, David A. W...
ICS
2009
Tsinghua U.
14 years 2 months ago
Computer generation of fast fourier transforms for the cell broadband engine
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has...
Srinivas Chellappa, Franz Franchetti, Markus P&uum...
CAINE
2006
13 years 8 months ago
A multiobjective evolutionary approach for constrained joint source code optimization
The synergy of software and hardware leads to efficient application expression profile (AEP) not only in terms of execution time and energy but also optimal architecture usage. We...
Naeem Zafar Azeemi
ICPP
1999
IEEE
13 years 11 months ago
A Framework for Interprocedural Locality Optimization Using Both Loop and Data Layout Transformations
There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However,...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
IWMM
2011
Springer
270views Hardware» more  IWMM 2011»
12 years 10 months ago
Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead
Multiprocessors based on processors with multiple cores usually include a non-uniform memory architecture (NUMA); even current 2-processor systems with 8 cores exhibit non-uniform...
Zoltan Majo, Thomas R. Gross