Determination of the functions of all expressed proteins represents one of the major upcoming challenges in computational molecular biology. Since subcellular location plays a crucial role in protein function, the availability of systems that can predict location from sequence or high-throughput systems that determine location experimentally will be essential to the full characterization of expressed proteins. The development of prediction systems is currently hindered by an absence of training data that adequately captures the complexity of protein localization patterns. What is needed is a systematics for the subcellular locations of proteins. This paper describes an approach to the quantitative description of protein localization patterns using numerical features and the use of these features to develop classifiers that can recognize all major subcellular structures in fluorescence microscope images. Such classifiers provide a valuable tool for experiments aimed at determining the ...
Robert F. Murphy, Michael V. Boland, Meel Velliste