Many natural language processing approaches at various complexity levels have been reported for extracting biochemical interactions from MEDLINE. While some algorithms using simple template matching are unable to deal with the complex syntactic structures, others exploiting sophisticated parsing techniques are hindered by greater computational cost. This study investigates link grammar parsing for extracting biochemical interactions. Link grammar parsing can handle many syntactic structures and is computationally relatively efficient. We experimented on a sample MEDLINE corpus. Although the parser was originally developed for conversational English and made many mistakes in parsing sentences from the biochemical domain, it nevertheless achieved better overall performance than a co-occurrence-only method. Customizing the parser for the biomedical domain is expected to improve its performance further.
Jing Ding, Daniel Berleant, Jun Xu, Andy W. Fulmer