Abstract. Protein fold recognition is an important step towards understanding protein three-dimensional structures and their functions. A conditional graphical model, i.e. segmentation conditional random fields (SCRFs), is proposed to solve the problem. In contrast to traditional graphical models such as hidden markov model (HMM), SCRFs follow a discriminative approach. It has the flexibility to include overlapping or long-range interaction features over the whole sequence, as well as global optimally solutions for the parameters. On the other hand, the segmentation setting in SCRFs makes its graphical structures intuitively similar to the protein 3-D structures and more importantly, provides a framework to model the long-range interactions directly. Our model is applied to predict the parallel -helix fold, an important fold in bacterial infection of plants and binding of antigens. The crossfamily validation shows that SCRFs not only can score all known helices higher than non -helices...
Yan Liu, Jaime G. Carbonell, Peter Weigele, Vanath