VLIW architectures are popular in embedded systems because they offer high-performance processing at low cost and energy. The major problem with traditional VLIW designs is that t...
Hongtao Zhong, Kevin Fan, Scott A. Mahlke, Michael...
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
In an out-of-order issue processor, instructions are dynamically reordered and issued to function units in their dataready order rather than their original program order to achiev...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational resources available on the Internet. Such systems allow guest jobs to run on a ho...
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...