We describe a method for performance analysis of large software systems that combines a fast instruction-set simulator with off-line detailed analysis of segments of the execution. The combination is faster than straight cycleaccurate simulation, simpler and more flexible than techniques relying on hardware monitoring, and accurate. Specifically, the instruction-set simulator, running at a slowdown of around 50, maintains enough target state throughout the execution that an arbitrarily collected segment of the instruction trace is sufficient input for a postprocessing, cycle-accurate model of the processor and memory hierarchy. We present a case study to support our contention that a reduced state is sufficient as input to a cycle-accurate simulator. We use a commercial M88110-based prototype system as a reference point, and show that for three trace segments, the cycle-accurate post-processing gives reliable data to do system optimization.
Bengt Werner, Peter S. Magnusson