To achieve maximum performance gains through compiler optimization, most automatic performance tuning systems use a feed-back directed approach to rate the code versions generated under different optimization options and to search for the best one. They all face the problem that code versions are only comparable if they run under the same execution context. This paper proposes three accurate, fast and flexible rating approaches that address this problem. The three methods identify comparable execution contexts, model relationships between contexts, or force re-execution of the code under the same context, respectively. We apply these methods in an automatic offline tuning scenario. Our performance tuning system improves the program performance of a selection of SPEC CPU 2000 benchmarks by up to 178% (26% on average). Our techniques reduce program tuning time by up to 96% (80% on average), compared to the state-of-the-art tuning scenario that compares optimization techniques using wh...