Text classification systems on biomedical literature aim to select relevant articles to a specific issue from large corpora. Most systems with an acceptable accuracy are based on domain knowledge, which is very expensive and does not provide a general solution. This paper presents a novel approach for text classification on biomedical literature, involving the use of information extracted from related web resources. We validated this approach by implementing the proposed method and testing it on the KDD2002 Cup challenge: bio-text task. Results show that our approach can effectively improve efficiency on text classification systems for biomedical literature. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications—Bioinformatics (genome or protein)databases, Feature extraction or construction, Text mining, Web mining Keywords biomedical text classification
Francisco M. Couto, Bruno Martins, Mário J.