Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction

15 years 6 months ago

Download www.biomedcentral.com

Background: With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. Results: We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different clas...

Lance E. Palmer, Mathäus Dejori, Randall A. B

Real-time Traffic

BMCBI 2010 | Genome | Sequencing | True Overlaps |

claim paper

» DecGPU distributed error correction on massively parallel graphics processing units using ...

» An algorithm for automated closure during assembly

» Reranking candidate gene models with crossspecies comparison for improved gene prediction

Post Info
More Details (n/a)

Added	08 Dec 2010
Updated	08 Dec 2010
Type	Journal
Year	2010
Where	BMCBI
Authors	Lance E. Palmer, Mathäus Dejori, Randall A. Bolanos, Daniel P. Fasulo

Comments (0)

Sciweavers

Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction

BMCBI 2010 | Genome | Sequencing | True Overlaps |

Explore & Download

Productivity Tools

Sciweavers