An application of text categorization methods to gene ontology annotation

16 years 4 days ago

Download www.ai.cs.kobe-u.ac.jp

This paper describes an application of IR and text categorization methods to a highly practical problem in biomedicine, speciﬁcally, Gene Ontology (GO) annotation. GO annotation is a major activity in most model organism database projects and annotates gene functions using a controlled vocabulary. As a ﬁrst step toward automatic GO annotation, we aim to assign GO domain codes given a speciﬁc gene and an article in which the gene appears, which is one of the task challenges at the TREC 2004 Genomics Track. We approached the task with careful consideration of the specialized terminology and paid special attention to dealing with various forms of gene synonyms, so as to exhaustively locate the occurrences of the target gene. We extracted the words around the gene occurrences and used them to represent the gene for GO domain code annotation. As a classiﬁer, we adopted a variant of k-Nearest Neighbor (kNN) with supervised term weighting schemes to improve the performance, making ou...

Kazuhiro Seki, Javed Mostafa

Real-time Traffic