Recently, many advanced machine learning approaches have been proposed for coreference resolution; however, all of the discriminatively-trained models reason over mentions rather than entities. That is, they do not explicitly contain variables indicating the “canonical” values for each attribute of an entity (e.g., name, venue, title, etc.). This canonicalization step is typically implemented as a post-processing routine to coreference resolution prior to adding the extracted entity to a database. In this paper, we propose a discriminatively-trained model that jointly performs coreference resolution and canonicalization, enabling features over hypothesized entities. We validate our approach on two different coreference problems: newswire anaphora resolution and research paper citation matching, demonstrating improvements in both tasks and achieving an error reduction of up to 62% when compared to a method that reasons about mentions only.
Michael L. Wick, Aron Culotta, Khashayar Rohaniman