Abstract—Execution of applications on upcoming highperformance computing (HPC) systems introduces a variety of new challenges and amplifies many existing ones. These systems will be composed of a large number of “fat” nodes, where each node consists of multiple processors on a chip with symmetric multithreading capabilities, interconnected via high-performance networks. Traditional system software for parallel computing considers these chip multiprocessors (CMPs) as arrays of symmetric multiprocessing cores, when in fact there are fundamental differences among them. Opportunities for optimization on CMPs are lost using this approach. We show that support for fine-grained parallelism coupled with an integrated approach for scheduling of compute and communication tasks is required for efficient execution on this architecture. We propose Phoenix, a runtime system designed specifically for execution on CMP architectures to address the challenges of performance and programmability for...
Avneesh Pant, Hassan Jafri, Volodymyr V. Kindraten