Weconsider tile automatedidentification of transmembrane domains in membrane protein sequences. 324 proteins (containing 1585 segrrmnts) werc examined, representing every protein in the PIR database having the transmembrane domain feature annotation. Machine learning techniques were used to evaluate the efficacy of alternative hydrophobieity measures and windowing techniques. Wedescribe a simpler measure of taydrophobicity and a newvariable windowsize concept. Wedemonstrate that these techniques are superior to some previous techniques in minimizing the segment error rate. Using these newtechniques: we describe an algorithm that has a 7.9% segment error rate on the sampled proteins, while classifying 16.7%of the anfino acid residues as transmembrane.
Sholom M. Weiss, Dawn M. Cohen, Nitin Indurkhya