We present an extension to an existing SPARC V8 instruction set simulator, SimICS, to support accurate profiling of branches and instruction cache misses. SimICS had previously supported profiling data cache efficiency and virtual memory performance (TLB misses), and estimated execution profiling using sampling. The new design allows a system-level, threaded-code simulator of a computer system to efficiently support a relatively complete range of instrumentation. Principal applications include computer architecture studies and performance tuning of software. Both application areas require reasonable performance in order to support realistic workloads, and both benefit from the flexibility, generality, and portability of a fast threaded-code simulator. The presented design supports multiprocessor simulation, system-level (operating system) programs, and, in principle, arbitrary user programs including run-time generated code. We evaluate the performance using the SPECint95 benchmark su...
Peter S. Magnusson