A variety of compilers, static analyses, and testing frameworks rely heavily on path frequency information. Uses for such information range from optimizing transformations to bug finding. Path frequencies are typically obtained through profiling, but that approach is severely restricted: it requires running programs in an indicative environment, and on indicative test inputs. We present a descriptive statistical model of path frequency based on features that can be readily obtained from a program's source code. Our model is over 90% accurate with respect to several benchmarks, and is sufficient for selecting the 5% of paths that account for over half of a program's total runtime. We demonstrate our technique's robustness by measuring its performance as a static branch predictor, finding it to be more accurate than previous approaches on average. Finally, our qualitative analysis of the model provides insight into which source-level features indicate "hot paths.&quo...
Raymond P. L. Buse, Westley Weimer