Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective...
Jeffrey Dean, James E. Hicks, Carl A. Waldspurger,...
As processors continue to exploit more instruction level parallelism, a greater demand is placed on reducing the e ects of memory access latency. In this paper, we introduce a nov...
High performance architectures depend heavily on efficient multi-level memory hierarchies to minimize the cost of accessing data. This dependence will increase with the expected i...
Highly aggressive multi-issue processor designs of the past few years and projections for the next decade require that we redesign the operation of the cache memory system. The nu...
Jude A. Rivers, Gary S. Tyson, Edward S. Davidson,...
We describe Dead Value Information (DVI) and introduce three new optimizations which exploit it. DVI provides assertions that certain register values are dead, meaning they will n...
Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-mem...
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay ...
We propose a method for compressing programs in embedded processors where instruction memory size dominates cost. A post-compilation analyzer examines a program and replaces commo...
Charles Lefurgy, Peter L. Bird, I-Cheng K. Chen, T...