This paper reports on an empirical evaluation of the fault-detecting ability of two white-box software testing techniques: decision coverage (branch testing) and the all-uses data flow testing criterion. Each subject program was tested using a very large number of randomly generated test sets. For each test set, the extent to which it satisfied the given testing criterion was measured and it was determined whether or not the test set detected a program fault. These data were used to explore the relationship between the coverage achieved by test sets and the likelihood that they will detect a fault. Previous experiments of this nature have used relatively small subject programs and/or have used programs with seeded faults. In contrast, the subjects used here were eight versions of an antenna configuration program written for the European Space Agency, each consisting of over 10,000 lines of C code. For each of the subject programs studied, the likelihood of detecting a fault increased ...
Phyllis G. Frankl, Oleg Iakounenko