Identifying Gene Function Descriptions by Probability-based Sentence Selection

15 years 8 months ago

Download trec.nist.gov

This paper proposes an approach to the secondary task in the TREC Genomics Track. We regard the task as identiﬁcation of the sentences describing gene functions (i.e., GeneRIFs) and propose a method considering two factors: topicality and relevance. The former refers to the topicality of a sentence and is measured based on location information and word frequencies in the article. The latter refers to the relevance as a GeneRIF based on the vocabulary used in the article. We formalize a probabilistic model combining these features. Our method is evaluated on the test set of 139 MEDLINE abstracts, and the results demonstrate that (a) function words in input could help to identify gene function descriptions and that (b) there is a vocabulary peculiar to GeneRIFs and that (c) location information shows the highest predictive power for this particular task despite its simplicity. Additionally, we examine some alternative methods in comparison with our method.

Kazuhiro Seki, Nihar Sheth, Javed Mostafa

Real-time Traffic