Sciweavers

GCB
1997
Springer

Statistics of large scale sequence searching

14 years 3 months ago
Statistics of large scale sequence searching
Motivation: Database search programs such as FASTA, BLAST or a rigorous Smith–Waterman algorithm produce lists of database entries, which are assumed to be related to the query. The computation of statistical significance of similarity scores is well established for single pairs of sequences and using purely random models. However, the multi-trial context of a database search poses new problems. The credibility of a certain score obtained in a database search decreases with the amount of data that is compared. To improve p-value computation for database search experiments, statistical properties of the databases, such as the distribution of sequence length and effects induced by frequently repeated sequence patterns, need to be taken into account. Results: We investigated the SWISS-PROT protein database
Rainer Spang, Martin Vingron
Added 07 Aug 2010
Updated 07 Aug 2010
Type Conference
Year 1997
Where GCB
Authors Rainer Spang, Martin Vingron
Comments (0)