With advances in hardware, instruction set architectures are undergoing continual evolution. As a result, compilers are under constant pressure to adapt and take full advantage of available features. However, current techniques for evaluating relative compiler performance only compare profiles at the application level, ignoring relative performance differences at finer granularities. To ensure that new features are put to good use, a more rigorous approach is necessary. A fundamental step in tuning compiler performance is identifying the specific examples that can be improved. To solve this problem, we present a compiler-independent binary matching technique to compare executions of differently compiled programs and identify intervals where the behavior can be meaningfully compared. Matched intervals can be automatically analyzed to identify anomalous segments of execution where one version performs significantly differently versus another. We present case studies using Chainsaw t...