We present an architecture that features dynamic multithreading execution of a single program. Threads are created automatically by hardware at procedure and loop boundaries and e...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques...
In this paper we evaluate the atomic region compiler abstraction by incorporating it into a commercial system. We find that atomic regions are simple and intuitive to integrate i...
Naveen Neelakantam, David R. Ditzel, Craig B. Zill...
With increasing clock frequencies and silicon integration, power aware computing has become a critical concern in the design of embedded processors and systems-on-chip. One of the...
Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pa...
In this paper, we extensively tune and then compare the performance of web servers based on three different server architectures. The µserver utilizes an event-driven architectur...
David Pariag, Tim Brecht, Ashif S. Harji, Peter A....