Finding biological entities (such as genes or proteins) that satisfy certain conditions from texts is an important and challenging task in biomedical information retrieval and text mining. It is essential for many biomedical applications, such as drug discovery which normally requires collecting existing scientific facts from documents. This paper presents an effective IR system for this task, in which 1) domain knowledge is incorporated to improve retrieval effectiveness; 2) query expansion with related concepts on multiple semantic levels is employed; 3) a gene symbol disambiguation technique is implemented. We evaluated these techniques and examined two different concept-based IR models. Experiments based upon the proposed framework yield significant improvement (22% for automatic and 16.7% for nonautomatic) over the best reported results of passage retrieval in the Genomics track of TREC 2007. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Informatio...
Wei Zhou, Clement T. Yu, Weiyi Meng