The process of software reverse engineering commonly uses an extractor, which parses source code and extracts facts about the code. The level of detail in these facts varies from extractor to extractor. This paper describes four levels of increasingly detailed completeness of these facts: (semantic completeness, compiler completeness, syntax completeness and source completeness) and introduces the concept of relative completeness of extractors. Validating that an extractor correctly produces facts at a given level of completeness is in general very challenging. This paper gives a method for validating the semantic completeness of an extractor, and describes the application of this method to CPPX, an extractor for C or C++ based on GCC.
Yuan Lin, Richard C. Holt, Andrew J. Malton