The Average Common Substring Approach to Phylogenomic Reconstruction

15 years 7 months ago

Download www.cs.tau.ac.il

We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings. It is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two long sequences can be calculated in O( ) time. We implemented the algorithm, using suffix arrays. The implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species, and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic and taxonomic truth". To assess our approach, it was compared to the traditional (...

Igor Ulitsky, David Burstein, Tamir Tuller, Benny

Real-time Traffic

Genome | JCB 2006 | Maximum Common Substrings | Phylogenetic |

claim paper

» Fast algorithms for computing sequence distances by exhaustive substring composition

» Testcase reduction for C compiler bugs

» Areaefficient instruction set synthesis for reconfigurable systemonchip designs

» NESTA A Fast and Accurate FirstOrder Method for Sparse Recovery

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2006
Where	JCB
Authors	Igor Ulitsky, David Burstein, Tamir Tuller, Benny Chor

Comments (0)

Sciweavers

The Average Common Substring Approach to Phylogenomic Reconstruction

Genome | JCB 2006 | Maximum Common Substrings | Phylogenetic |

Explore & Download

Productivity Tools

Sciweavers