

Clustering exact matches of pairwise sequence alignments by weighted linear regression

14 years 3 months ago
Clustering exact matches of pairwise sequence alignments by weighted linear regression
Background: At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating postassembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and...
Alvaro J. González, Li Liao
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2008
Authors Alvaro J. González, Li Liao
Comments (0)