Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
This paper describes transparent mechanisms for emulating some of the data distribution facilities offered by traditional data-parallel programming models, such as High Performance...
Dimitrios S. Nikolopoulos, Theodore S. Papatheodor...
Dynamic optimization has the potential to adapt the program’s behavior at run-time to deliver performance improvements over static optimization. Dynamic optimization systems usu...
2's complement number system imposes a fundamental limitation on the power and performance of arithmetic circuits, due to the fundamental need of cross-datapath carry propaga...
Rooju Chokshi, Krzysztof S. Berezowski, Aviral Shr...
OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is ...
Ramachandra C. Nanjegowda, Oscar Hernandez, Barbar...