We present an NLP system that classifies the assertion type of medical problems in clinical notes used for the Fourth i2b2/VA Challenge. Our classifier uses a variety of linguistic features, including lexical, syntactic, lexicosyntactic, and contextual features. To overcome an extremely unbalanced distribution of assertion types in the data set, we focused our efforts on adding features specifically to improve the performance of minority classes. As a result, our system reached 94.17% micro-averaged and 79.76% macro-averaged F1-measures, and showed substantial recall gains on the minority classes.
Youngjun Kim, Ellen Riloff, Stèphane M. Mey