The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques...
As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Up to now, system designers have prim...
George A. Reis, Jonathan Chang, Neil Vachharajani,...
Writing concurrent programs is difficult because of the complexity of ensuring proper synchronization. Conventional lock-based synchronization suffers from wellknown limitations, ...
As processor speeds increase and memory latency becomes more critical, intelligent design and management of secondary caches becomes increasingly important. The efficiency of curr...
Moinuddin K. Qureshi, David Thompson, Yale N. Patt
Sensor network processors and their applications are a growing area of focus in computer system research and design. Inherent to this design space is a reduced processing performa...
Leyla Nazhandali, Bo Zhai, Javin Olson, Anna Reeve...
Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuousl...
Runahead execution is a technique that improves processor performance by pre-executing the running application instead of stalling the processor when a long-latency cache miss occ...
RENO is a modified MIPS R10000 register renamer that uses map-table “short-circuiting” to implement dynamic versions of several well-known static optimizations: move eliminat...
It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We are motivated by the observation that this behavior extends to much coarser grai...