Natural Language Processing (NLP) is being applied for several information extraction tasks in the biomedical domain. The unique nature of clinical information requires the need for developing an NLP system designed specifically for the clinical domain. We describe a method to identify semantically coherent phrases within clinical reports. This is an important step towards full syntactic parsing within a clinical NLP system. We use this semantic phrase chunker to identify anatomical phrases within radiology reports related to the genitourinary domain. A discriminative classifier based on support vector machines was used to classify words into one of five phrase classification categories. Training of the classifier was performed using 1000 hand-tagged sentences from a corpus of genitourinary radiology reports. Features used by the classifier include n-grams, syntactic tags and semantic labels. Evaluation was conducted on a blind test set of 250 sentences from the same domain. The system...
Vijayaraghavan Bashyam, Ricky K. Taira