As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network des...
Application-independent Redundancy Elimination (RE), or identifying and removing repeated content from network transfers, has been used with great success for improving network pe...
Software-controlled scratchpad memory is increasingly employed in embedded systems as it offers better timing predictability compared to caches. Previous scratchpad allocation alg...
The synergy of software and hardware leads to efficient application expression profile (AEP) not only in terms of execution time and energy but also optimal architecture usage. We...
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...