The traditional approach for characterizing complex systems is to run standard workloads and measure the resulting performance as seen by the end user. However, unique opportuniti...
Haryadi S. Gunawi, Nitin Agrawal, Andrea C. Arpaci...
CMOS scaling increases susceptibility of microprocessors to transient faults. Most current proposals for transient-fault detection use full redundancy to achieve perfect coverage ...
Memory latency tolerant architectures support thousands of in-flight instructions without scaling cyclecritical processor resources, and thousands of useful instructions can compl...
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth...
To maintain coherence in conventional shared-memory multiprocessor systems, processors first check other processors’ caches before obtaining data from memory. This coherence che...
Instruction set customization is an effective way to improve processor performance. Critical portions of application dataflow graphs are collapsed for accelerated execution on s...
Nathan Clark, Jason A. Blome, Michael L. Chu, Scot...
Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy while requiring fast access. Neither private nor shared caches can provide bot...
Zeshan Chishti, Michael D. Powell, T. N. Vijaykuma...
: The theoretical study of quantum computation has yielded efficient algorithms for some traditionally hard problems. Correspondingly, experimental work on the underlying physical...
Steven Balensiefer, Lucas Kreger-Stickles, Mark Os...
Performance asymmetry in multicore architectures arises when individual cores have different performance. Building such multicore processors is desirable because many simple cores...
Saisanthosh Balakrishnan, Ravi Rajwar, Michael Upt...
Pipelined forwarding engines are used in core routers to meet speed demands. Tree-based searches are pipelined across a number of stages to achieve high throughput, but this resul...
Florin Baboescu, Dean M. Tullsen, Grigore Rosu, Su...