Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increa...
Abhinav Bhatele, Eric J. Bohm, Laxmikant V. Kal&ea...
A large class of systems of biological and technological relevance can be described as analog networks, that is, collections of dynamic devices interconnected by links of varying s...
Claudio Mattiussi, Daniel Marbach, Peter Dürr, Da...
Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip in...
This paper deals with general nested loops and proposes a novel scheduling methodology for reducing the communication cost of parallel programs. General loops contain complex loop...
Florina M. Ciorba, Theodore Andronikos, Ioannis Dr...
Continued device scaling enables microprocessors and other systems-on-chip (SoCs) to increase their performance, functionality, and hence, complexity. Simultaneously, relentless s...