Probabilistic base calling of Solexa sequencing data

15 years 7 months ago

Download www.biomedcentral.com

Background: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. Results: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. Conclusion: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of S...

Jacques Rougemont, Arnaud Amzallag, Christian Isel

Real-time Traffic

BMCBI 2008 | Image Files | Sequencing | Tags |

claim paper

» naiveBayesCall An Efficient ModelBased BaseCalling Algorithm for HighThroughput Sequencing

» Accelerating error correction in highthroughput shortread DNA sequencing data with CUDA

» Efficient alignment of pyrosequencing reads for resequencing applications

» Fuzzy Intrusion Detection System via Data Mining Technique with Sequences of System Calls

» Data Mining Approach for Analyzing Call Center Performance

» Predicting the intrusion intentions by observing system call sequences

» An effective approach for identification of in vivo proteinDNA binding sites from paireden...

» Conditional Random Fields Probabilistic Models for Segmenting and Labeling Sequence Data

Post Info
More Details (n/a)

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2008
Where	BMCBI
Authors	Jacques Rougemont, Arnaud Amzallag, Christian Iseli, Laurent Farinelli, Ioannis Xenarios, Felix Naef

Comments (0)

Sciweavers

Probabilistic base calling of Solexa sequencing data

BMCBI 2008 | Image Files | Sequencing | Tags |

Explore & Download

Productivity Tools

Sciweavers