Machine Learning in Basecalling Decoding Trace Peak Behaviour

15 years 1 months ago

Download pubs.doc.ic.ac.uk

— DNA sequence basecalling is commonly regarded as a solved problem, despite significant error rates being reflected in inaccuracies in databases and genome annotations. These errors commonly arise from an inability to sequence through peak height variations in DNA sequencing traces from the Sanger sequencing method. Recent efforts toward improving basecalling accuracy have taken the form of more sophisticated digital filters and feature detectors. We demonstrate that the variation in peak heights itself encodes novel information which can be used for basecalling. To isolate this information for a clear demonstration, we perform a peculiar blind basecalling experiment using ABI processed output. Using classifiers responding to measurements in the context of the basecalling position, we call bases without reference to the peak heights at the basecalling position itself. Tree classifiers indicate which features are pertinent, and the application of neural nets to these features results...

David Thornley, Stavros Petridis

Real-time Traffic

CIBCB 2006 | DNA Sequence Basecalling | Peak Height Variations | Peak Heights |

claim paper

Added	10 Jun 2010
Updated	10 Jun 2010
Type	Conference
Year	2006
Where	CIBCB
Authors	David Thornley, Stavros Petridis

Sciweavers

Machine Learning in Basecalling Decoding Trace Peak Behaviour

CIBCB 2006 | DNA Sequence Basecalling | Peak Height Variations | Peak Heights |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers