Automatic vectorization of programs for partitioned-ALU SIMD (Single Instruction Multiple Data) processors has been difficult because of not only data dependency issues but also n...
We introduce a new non-intrusive on-chip cache-tuning hardware module capable of accurately predicting the best configuration of a configurable cache for an executing application....
Ann Gordon-Ross, Pablo Viana, Frank Vahid, Walid A...
This paper describes hardware methods, a lightweight and platform-independent scheme for linking real-time Java code to co-processors implemented using a hardware description lang...
Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performa...
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, Scott A. M...
The optimistic nature of Transactional Memory (TM) systems can lead to the concurrent execution of transactions that are later found to conflict. Conflicts degrade scalability, a...