We first analyzed protein names using various dictionaries and databases and found five problems with protein names; i.e., the treatment of special characters, the treatment of homonyms, cases where the protein-name string may be a substring of a different protein-name string, cases where one protein exists in different organisms, and the treatment of modifiers. We confirmed that we could use a machine-learning approach to recognizing protein names to solve these problems. Thus, machine-learning methods have recently been used in research to recognize protein names. A classifier trained in a specific domain, however, can cause overfitting and be so inflexible that it can only be used in that domain. We therefore developed a new corpus on breast cancer and investigated the flexibility of classifiers trained on the GENIA [1] or the breast-cancer corpora. We used a transductive support vector machine (SVM) to avoid overfitting, and we evaluated the effect of transductive learning. We foun...