Sciweavers

RECOMB
1999
Springer

Classifying proteins by family using the product of correlated p-values

14 years 3 months ago
Classifying proteins by family using the product of correlated p-values
An important goal in bioinformatics is determining the homology and function of proteins from their sequences. Pairwise sequence similarity algorithms are often employed for this purpose. This paper describes a method for improving the accuracy of such algorithms using knowledge about families of proteins. The method requires a library of protein families against which to compare query sequences. A standard pairwise similarity search algorithm is used to search the library with the query, and a new variant of the Family Pairwise Search (FPS) algorithm converts the results into a list sorted by the E-values of the matches between the query and the families. The E-value of each query-family match is calculated using a statistical distribution introduced here that describes the behavior of the product of the p-values of correlated random variables. We also describe an algorithm (ESIZE) for estimating the single parameter of this distribution. This parameter summarizes the amount of corre...
Timothy L. Bailey, William Noble Grundy
Added 04 Aug 2010
Updated 04 Aug 2010
Type Conference
Year 1999
Where RECOMB
Authors Timothy L. Bailey, William Noble Grundy
Comments (0)