Dynamic binary translation systems enable a wide range of applications such as program instrumentation, optimization, and security. DBTs use a software code cache to store previously translated instructions. The code layout in the code cache greatly differs from the code layout of the original program. This paper provides an exhaustive analysis of the performance of the instruction/trace cache and other structures of the microarchitecture while executing DBTs that focus on program instrumentation, such as DynamoRIO and Pin. We performed our evaluation along two axes. First, we directly accessed the hardware performance counters to determine actual cache miss counts. Second, we used simulation to analyze the spatial locality of the translated application. Our results show that when executing an application under the control of Pin or DynamoRIO, the icache miss counts actually increase over 2X. Surprisingly, the L2 cache and the L1 data cache show a much lower performance degradation or...
Arkaitz Ruiz-Alvarez, Kim M. Hazelwood