—Cytochrome b561 (Cyt-b561) proteins play important functions in plants such as anti-toxin defense reactions, growth and development, and prevention of damage to plants from excess light under drought condition. Because of their high sequence divergence, thorough mining of Cyt-b561 and related proteins from diverse plant genomes is not easy. For example, currently there is only one Cyt-b561 gene in the maize genome and none has been found from the soybean genome, while twenty two are known in the Arabidopsis thaliana genome. Alignment-free methods for protein classification, e.g., multivariate statistical analysis methods using various amino acid properties as sequence descriptors, can be more sensitive for remotely similar protein identification compared to often-used alignment-based methods. In order to identify Cyt-b561 proteins thoroughly from available plant genomes, we examined alignment-free protein classifiers based on partial least squares (PLS) and support vector machine...
Stephen O. Opiyo, Etsuko N. Moriyama