Raising seeds for biological experiments is prone to error; a careful experimenter will test in the lab to verify that plants are of the intended strain. Choosing a minimal set of tests that will discriminate between all known seedlines is an instance of Minimal Test Set, a NP-complete problem. Similar biological problems, such as minimizing the number of haplotype tag SNPs, require complex nondeterministic heuristics to solve in reasonable timeframes over modest datasets. However, selecting the minimal marker set to discriminate among seedlines is less complicated than other problems considered in the literature; we show that a simple heuristic approach works well in practice. Finding all minimal sets of tests to identify 91 Zea mays recombinant inbred lines would require months of CPU time; our heuristic gives a result less than twice the minimal possible size in under five seconds, with similar performance on Arabidopsis thaliana recombinant inbred lines.
Thomas C. Hudson, Ann E. Stapleton, Amy M. Curley