Simultaneous identification of long similar substrings in large sets of sequences

15 years 2 months ago

Download www.biomedcentral.com

Background: Sequence comparison faces new challenges today, with many complete genomes and large libraries of transcripts known. Gene annotation pipelines match these sequences in order to identify genes and their alternative splice forms. However, the software currently available cannot simultaneously compare sets of sequences as large as necessary especially if errors must be considered. Results: We therefore present a new algorithm for the identification of almost perfectly matching substrings in very large sets of sequences. Its implementation, called ClustDB, is considerably faster and can handle 16 times more data than VMATCH, the most memory efficient exact program known today. ClustDB simultaneously generates large sets of exactly matching substrings of a given minimum length as seeds for a novel method of match extension with errors. It generates alignments of maximum length with a considered maximum number of errors within each overlapping window of a given size. Such alignm...

Jürgen Kleffe, Friedrich Möller, Burghar

Real-time Traffic

BMCBI 2007 | Gene Annotation Pipelines | Large Sets | Sequence Comparisons |

claim paper

Post Info
More Details (n/a)

Added	12 Dec 2010
Updated	12 Dec 2010
Type	Journal
Year	2007
Where	BMCBI
Authors	Jürgen Kleffe, Friedrich Möller, Burghardt Wittig

Comments (0)

Sciweavers

Simultaneous identification of long similar substrings in large sets of sequences

BMCBI 2007 | Gene Annotation Pipelines | Large Sets | Sequence Comparisons |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers