We introduce techniques to support efficient non-atomic execution of very long traces on a new binary translation based, x86-64 compatible VLIW microprocessor. Incrementally comm...
We present the TM3270 media-processor, the latest TriMedia VLIW processor, tuned to address the performance demands of standard definition video processing, combined with embedded...
Jan-Willem van de Waerdt, Stamatis Vassiliadis, Sa...
Dynamic voltage and frequency scaling (DVFS) is an effective technique for controlling microprocessor energy and performance. Existing DVFS techniques are primarily based on hardw...
Qiang Wu, Margaret Martonosi, Douglas W. Clark, Vi...
This paper describes a scalable, low-complexity alternative to the conventional load/store queue (LSQ) for superscalar processors that execute load and store instructions speculat...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
As more data value speculation mechanisms are being proposed to speed-up processors, there is growing pressure on the critical processor structures that must buffer the state of t...
Clustered machines partition hardware resources to circumvent the cycle time penalties incurred by large, monolithic structures. This partitioning introduces a long inter-cluster ...
Recent experimental advances have demonstrated technologies capable of supporting scalable quantum computation. A critical next step is how to put those technologies together into...
Tzvetan S. Metodi, Darshan D. Thaker, Andrew W. Cr...
Data prefetching via helper threading has been extensively investigated on Simultaneous MultiThreading (SMT) or Virtual Multi-Threading (VMT) architectures. Although reportedly la...
Until recently, a steadily rising clock rate and other uniprocessor microarchitectural improvements could be relied upon to consistently deliver increasing performance for a wide ...
Guilherme Ottoni, Ram Rangan, Adam Stoler, David I...