Computational analyses of protein structure-function relationships have traditionally been based on sequence homology, fold family analysis and 3D motifs/templates. Previous structurebased approaches characterize and compare active sites based on global shape and electrostatic properties. But, these methodologies are unable to capture similarities between diverse active sites that span multiple fold families despite catalyzing the same reaction (convergent evolution). In this work, we extend previous feature-based analyses of active sites by defining a system of localized geometric and electrostatic descriptors that identify localized patterns of protein-ligand interactions. Singular Value Decomposition is used to identify linear combinations of features with maximum information content which are then used to compute the class conditional probability density distribution of active sites using kernel density estimation. We successfully tested our algorithm on a database that contained e...
Reetal Pai, James C. Sacchettini, Thomas R. Ioerge