We present a maximum-entropy based system incorporating a diverse set of features for identifying genes and proteins in biomedical s. This system was entered in the BioCreative comparative evaluation and achieved the best performance in the "open" evaluation and the second-best performance in the "closed" evaluation. Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge including full MEDLINE abstracts and web searches.
Jenny Rose Finkel, Shipra Dingare, Christopher D.