Architectures that exploit control independence (CI) promise to remove in-order fetch bottlenecks, like branch mispredicts, instruction-cache misses and fetch unit stalls, from th...
Instruction-grain program monitoring tools, which check and analyze executing programs at the granularity of individual instructions, are invaluable for quickly detecting bugs and...
Shimin Chen, Michael Kozuch, Theodoros Strigkos, B...
Within-die process variation causes individual cores in a Chip Multiprocessor (CMP) to differ substantially in both static power consumed and maximum frequency supported. In this ...
We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance...
Dana Vantrease, Robert Schreiber, Matteo Monchiero...
This paper seeks to understand and design nextgeneration servers for emerging “warehousecomputing” environments. We make two key contributions. First, we put together a detail...
Kevin T. Lim, Parthasarathy Ranganathan, Jichuan C...
Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunc...
Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribu...
Katherine E. Coons, Behnam Robatmili, Matthew E. T...
The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory acc...
Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers,...