Efficient Enumeration of Phylogenetically Informative Substrings

15 years 1 months ago

Download www.cis.upenn.edu

We study the problem of enumerating substrings that are common amongst genomes that share evolutionary descent. For example, one might want to enumerate all identical (therefore conserved) substrings that are shared between all mammals and not found in nonmammals. Such collection of substrings may be used to identify conserved subsequences or to construct sets of identifying substrings for branches of a phylogenetic tree. For two disjoint sets of genomes on a phylogenetic tree, a substring is called a discriminating substring or a tag if it is found in all of the genomes of one set and none of the genomes of the other set. Given a phylogeny for a set of m species, each with a genome of length at most n, we develop a suffix-tree based algorithm to find all tags in O(nm log2 m) time. We also develop a sublinear space algorithm (at the expense of running time) that is more suited for very large data sets. We next consider a stochastic model of evolution to understand how tags arise. We sh...

Stanislav Angelov, Boulos Harb, Sampath Kannan, Sa

Real-time Traffic

Computational Biology | O(nm Log2 M | Phylogenetic Tree | RECOMB 2006 | Simple Linear Programming |

claim paper

Post Info
More Details (n/a)

Added	03 Dec 2009
Updated	03 Dec 2009
Type	Conference
Year	2006
Where	RECOMB
Authors	Stanislav Angelov, Boulos Harb, Sampath Kannan, Sanjeev Khanna, Junhyong Kim

Comments (0)

Sciweavers

Efficient Enumeration of Phylogenetically Informative Substrings

Computational Biology | O(nm Log2 M | Phylogenetic Tree | RECOMB 2006 | Simple Linear Programming |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers