Motivation: Database search programs such as FASTA, BLAST or a rigorous Smith–Waterman algorithm produce lists of database entries, which are assumed to be related to the query. The computation of statistical significance of similarity scores is well established for single pairs of sequences and using purely random models. However, the multi-trial context of a database search poses new problems. The credibility of a certain score obtained in a database search decreases with the amount of data that is compared. To improve p-value computation for database search experiments, statistical properties of the databases, such as the distribution of sequence length and effects induced by frequently repeated sequence patterns, need to be taken into account. Results: We investigated the SWISS-PROT protein database