Genome comparison without alignment using shortest unique substrings

15 years 6 months ago

Download www.biomedcentral.com

Background: Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees. Results: We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We deri...

Bernhard Haubold, Nora Pierstorff, Friedrich M&oum

Real-time Traffic

BMCBI 2005 | Sequence Comparison | Shortest Unique Substring | Unique Substring |

claim paper

Added	15 Dec 2010
Updated	15 Dec 2010
Type	Journal
Year	2005
Where	BMCBI
Authors	Bernhard Haubold, Nora Pierstorff, Friedrich Möller, Thomas Wiehe

Sciweavers

Genome comparison without alignment using shortest unique substrings

BMCBI 2005 | Sequence Comparison | Shortest Unique Substring | Unique Substring |

Explore & Download

Productivity Tools

Sciweavers