Modern network devices employ deep packet inspection to enable sophisticated services such as intrusion detection, traffic shaping, and load balancing. At the heart of such servi...
Randy Smith, Neelam Goyal, Justin Ormont, Karthike...
Until recently, parallel programming has largely focused on the exploitation of data-parallelism in dense matrix programs. However, many important application domains, including m...
Milind Kulkarni, Martin Burtscher, Calin Cascaval,...
Abstract—Developing CPU scheduling algorithms and understanding their impact in practice can be difficult and time consuming due to the need to modify and test operating system ...
Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow’s manyco...
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, He...
Until very recently, microprocessor designs were computation-centric. On-chip communication was frequently ignored. This was because of fast, single-cycle on-chip communication. T...
Many workload characterization studies depend on accurate measurements of the cost of executing a piece of code. Often these measurements are conducted using infrastructures to ac...
Dmitrijs Zaparanuks, Milan Jovic, Matthias Hauswir...
Insights into branch predictor organization and operation can be used in architecture-aware compiler optimizations to improve program performance. Unfortunately, such details are ...
This paper analyzes the performance of the TRIPS prototype chip’s block predictor. The prototype is the first implementation of the block-atomic TRIPS architecture, wherein the...
Nitya Ranganathan, Doug Burger, Stephen W. Keckler
We describe and evaluate two new, independently-applicable power reduction techniques for power management on processors that support dynamic voltage and frequency scaling (DVFS):...
Bin Lin, Arindam Mallik, Peter A. Dinda, Gokhan Me...