This paper is concerned with the reconstruction of perfect phylogenies from binary character data with missing values, and related problems of inferring complete haplotypes from haplotypes or genotypes with missing data. In cases where the problems considered are NP-hard we assume a rich data hypothesis under which they become tractable. Natural probabilistic models are introduced for the generation of character vectors, haplotypes or genotypes with missing data, and it is shown that these models support the rich data hypothesis. The principal results include: ? A near-linear time algorithm for inferring a perfect phylogenyfrom binary character data (or haplotype data) with missing values, under the rich data hypothesis; ? A quadratic-time algorithm for inferring a perfect phylogeny from genotype data with missing values with high probability, under certain distributional assumptions; ? Demonstration that the problems of maximum-likelihood inference of complete haplotypes from partial...
Eran Halperin, Richard M. Karp