This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses...
In this paper, we present several novel strategies to improve software controlled cache utilization, so as to achieve lower power requirements for multi-media and signal processin...
Main memory in many tera-scale systems requires tens of kilowatts of power. The resulting energy consumption increases system cost and the heat produced reduces reliability. Emerg...
Matthew E. Tolentino, Joseph Turner, Kirk W. Camer...
Operand bypass logic might be one of the critical structures for future microprocessors to achieve high clock speed. The delay of the logic imposes the execution time budget to be ...
— High-end computing (HEC) systems have passed the petaflop barrier and continue to move toward the next frontier of exascale computing. As companies and research institutes con...
Narayan Desai, Darius Buntinas, Daniel Buettner, P...