Abstract. There are several computer programs that align mRNA with its genomic counterpart to determine exon boundaries. Though most of these programs perform such alignment efficiently and accurately, they can only tolerate a relatively small number of sequencing errors. These programs also highly depend on the GT/AG rule in finding splice sites. Both properties make them less desirable in the case of aligning EST reconstructed transcript with genomic DNA to identify splicing variants, where a lot of sequencing errors and noncanonical splice sites are expected. Using a novel heuristic algorithm, we developed a tool that can handle much more sequencing errors. Test dataset results indicated that SWAT (Sequencing-error Well-handled Alignment Tool) has a much stronger error-handling ability than Sim4 and Spidey, two other popular spliced alignment tools. In the presence of up to 10 percent randomly introduced sequencing errors, it can still give the precise number of exons and exon bound...
Yifeng Li, Hesham H. Ali