Recently, there has been a growth in the amount of machine readable information pertaining to the biomedical field. With this growth comes a desire to be able to extract information, answer questions, etc. based on the information in the documents. Many of these desired tasks require sophisticated language processing algorithms, such as part-of-speech tagging, parsing, and semantic interpretation. In order to use these algorithms the text must first be cleansed of acronyms, abbreviations, and misspellings. In this paper we look at identifying, expanding, and disambiguating acronyms in biomedical texts. We present an integrated system that combines previously used methods for dealing with acronyms and Natural Language Processing techniques in new way for a new domain. The result is an integrated system that achieves a high precision and recall. We break the task up into three modular steps: Identification, Expansion, and Disambiguation. During identification, each word is examined to d...
David B. Bracewell, Fuji Ren, Shingo Kuroiwa