Position-speci c scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an e ective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical signi cance of the combined scores and evaluate the search quality (classi cation accuracy) and the accuracy of the estimate of statistical signi cance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score pvalues. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The mast sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http://www.sdsc.edu/MEME.
Timothy L. Bailey, Michael Gribskov