Sciweavers

286 search results - page 55 / 58
» Cache Remapping to Improve the Performance of Tiled Algorith...
Sort
View
MICRO
2010
IEEE
153views Hardware» more  MICRO 2010»
13 years 5 months ago
Throughput-Effective On-Chip Networks for Manycore Accelerators
As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network des...
Ali Bakhoda, John Kim, Tor M. Aamodt
SIGCOMM
2009
ACM
14 years 1 months ago
SmartRE: an architecture for coordinated network-wide redundancy elimination
Application-independent Redundancy Elimination (RE), or identifying and removing repeated content from network transfers, has been used with great success for improving network pe...
Ashok Anand, Vyas Sekar, Aditya Akella
CODES
2008
IEEE
14 years 1 months ago
Scratchpad allocation for concurrent embedded software
Software-controlled scratchpad memory is increasingly employed in embedded systems as it offers better timing predictability compared to caches. Previous scratchpad allocation alg...
Vivy Suhendra, Abhik Roychoudhury, Tulika Mitra
CAINE
2006
13 years 8 months ago
A multiobjective evolutionary approach for constrained joint source code optimization
The synergy of software and hardware leads to efficient application expression profile (AEP) not only in terms of execution time and energy but also optimal architecture usage. We...
Naeem Zafar Azeemi
ICS
2007
Tsinghua U.
14 years 1 months ago
Optimization of data prefetch helper threads with path-expression based statistical modeling
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Tor M. Aamodt, Paul Chow