Abstract. So far, most methods for identifying sequences under selection based on comparative sequence data have either assumed selectional pressures are the same across all branches of a phylogeny, or have focused on changes in specific lineages of interest. Here, we introduce a more general method that detects sequences that have either come under selection, or begun to drift, on any lineage. The method is based on a phylogenetic hidden Markov model (phylo-HMM), and does not require element boundaries to be determined a priori, making it particularly useful for identifying noncoding sequences. Insertions and deletions (indels) are incorporated into the phylo-HMM by a simple strategy that uses a separately reconstructed "indel history." To evaluate the statistical significance of predictions, we introduce a novel method for computing P-values based on prior and posterior distributions of the number of substitutions that have occurred in the evolution of predicted elements. W...
Adam C. Siepel, Katherine S. Pollard, David Haussl