Fast index based algorithms and software for matching position specific scoring matrices

15 years 6 months ago

Download www.biomedcentral.com

Background: In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. Results: We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our meth...

Michael Beckstette, Robert Homann, Robert Giegeric

Real-time Traffic

Amino Acid | BMCBI 2006 | Complete Genome | Specific Scoring Matrices |

claim paper

» Improving Remote Homology Detection Using Sequence Properties and Position Specific Scorin...

» DNA Motif Representation with Nucleotide Dependency

» Knowledgebased annotation of small molecule binding sites in proteins

Post Info
More Details (n/a)

Added	10 Dec 2010
Updated	10 Dec 2010
Type	Journal
Year	2006
Where	BMCBI
Authors	Michael Beckstette, Robert Homann, Robert Giegerich, Stefan Kurtz

Comments (0)

Sciweavers

Fast index based algorithms and software for matching position specific scoring matrices

Amino Acid | BMCBI 2006 | Complete Genome | Specific Scoring Matrices |

Explore & Download

Productivity Tools

Sciweavers