This paper discusses the obstacles that stand in the way of doing a good job of machine-code analysis. Compared with analysis of source code, the challenge is to drop all assumptions about having certain kinds of information available (variables, control-flow graph, callgraph, etc.) and also to address new kinds of behaviors (arithmetic on addresses, jumps to “hidden” instructions starting at positions that are out of registration with the instruction boundaries of a given reading of an instruction stream, self-modifying code, etc.). The paper describes some of the challenges that arise when analyzing machine code, and what can be done about them. It also provides a rationale for some of the design decisions made in the machine-codeanalysis tools that we have built over the past few years.
Thomas W. Reps, Junghee Lim, Aditya V. Thakur, Gog