Supercomputer performance is highly dependent on its interconnection subsystem design. In this paper we study how di erent architectural approaches for router design impact into system performance when running real parallel applications. A thorough methodology has been employed to quantify this impact. Architectural router decisions have been chosen taking into account the constraints of the underlying VLSI technology. After that, an exhaustive evaluation of the interconnection network under standard synthetic tra c has been carried out. Finally, an execution-driven simulation environment has been used to assess the consequences of several router designs on the performance of the entire machine. We will show that low-level decisions, as the adequate selection of router's arbiter, signi cantly reduce the execution time of parallel applications. To illustrate the e ects of the router architecture on system performance two benchmarks were selected: Radix and MP3D.
Valentin Puente, José A. Gregorio, Cruz Izu