The task of coreference resolution requires people or systems to decide when two referring expressions refer to the `same' entity or event. In real text, this is often a difficult decision because identity is never adequately defined, leading to contradictory treatment of cases in previous work. This paper introduces the concept of `near-identity', a middle ground category between identity and non-identity, to handle such cases systematically. We present a typology of Near-Identity Relations (NIDENT) that includes fifteen types--grouped under four main families--that capture a wide range of ways in which (near-)coreference relations hold between discourse entities. We validate the theoretical model by annotating a small sample of real data and showing that inter-annotator agreement is high enough for stability (K= 0.58, and up to K= 0.65 and K= 0.84 when leaving out one and two outliers, respectively). This work enables subsequent creation of the first internally consistent ...
Marta Recasens, Eduard H. Hovy, Maria Antòn