Abstract—A poorly understood but important factor in random testing is the selection of a maximum length for test runs. Given a limited time for testing, it is seldom clear whether executing a small number of long runs or a large number of short runs maximizes utility. It is generally expected that longer runs are more likely to expose failures — which is certainly true with respect to runs shorter than the shortest failing trace. However, longer runs produce longer failing traces, requiring more effort from humans in debugging or more resources for automated minimization. In testing with feedback, increasing ranges for parameters may also cause the probability of failure to decrease in longer runs. We show that the choice of test length dramatically impacts the effectiveness of random testing, and that the patterns observed in simple models and predicted by analysis are useful in understanding effects observed in a large scale case study of a JPL flight software system.
James H. Andrews, Alex Groce, Melissa Weston, Ru-G