With the ongoing advancements in VLSI technology, the performance of an embedded system is determined to a large extend by the communication of data and instructions. This results...
Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architect...
Stamatis G. Kavadias, Manolis Katevenis, Michail Z...
On systems with multi-core processors, the memory access scheduling scheme plays an important role not only in utilizing the limited memory bandwidth but also in balancing the pro...
We explore the possibilities to organize a query data structure in the main memories or hard disks of a cluster computer. The query data structure serves to improve the performanc...
Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increa...
Abhinav Bhatele, Eric J. Bohm, Laxmikant V. Kal&ea...