Multiprocessors-on-chip, such as the Cell BE processor, regularly suffer from restricted bandwidth to off-chip main memory. We propose to reduce memory bandwidth requirements, and...
As CMP platforms are widely adopted, more and more cores are integrated on to the die. To reduce the off-chip memory access, the last level cache is usually organized as a distribu...
- Bioinformatics applications such as gene and protein sequence matching algorithms are characterized by the need to process large amounts of data. While uni-processor performance ...
Ravi Kiran Karanam, Arun Ravindran, Arindam Mukher...
This paper presents the OpenDF framework and recalls that dataflow programming was once invented to address the problem of parallel computing. We discuss the problems with an impe...
Shuvra S. Bhattacharyya, Gordon J. Brebner, Jö...
As multicore architectures gain widespread use, it becomes increasingly important to be able to harness their additional processing power to achieve higher performance. However, e...
David Zhang, Qiuyuan J. Li, Rodric Rabbah, Saman A...
While technology trends have ushered in the age of chip multiprocessors (CMP) and enabled designers to place an increasing number of cores on chip, a fundamental question is what ...
Divya Gulati, Changkyu Kim, Simha Sethumadhavan, S...
Conventional programming models were designed to be used by expert programmers for programming for largescale multiprocessors, distributed computational clusters, or specialized p...
1 This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces ...
Javier Merino, Valentin Puente, Pablo Prieto, Jos&...