Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization

15 years 1 months ago

Download clgiles.ist.psu.edu

Cross Document Coreference (CDC) is the task of constructing the coreference chain for mentions of a person across a set of documents. This work offers a holistic view of using document-level categories, sub-document level context and extracted entities and relations for the CDC task. We train a categorization component with an efficient flat algorithm using thousands of ODP categories and over a million web documents. We propose to use ranked categories as coreference information, particularly suitable for web documents that are widely different in style and content. An ensemble composite coreference function, amenable to inactive features, combines these three levels of evidence for disambiguation. A thorough feature importance study is conducted to analyze how these three components contribute to the coreference results. The overall solution is evaluated using the WePS benchmark data and demonstrate superior performance.

Jian Huang 0002, Pucktada Treeratpituk, Sarah M. T

Real-time Traffic

COLING 2010 | Computational Linguistics | Coreference | Documents | Web Documents |

claim paper

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Jian Huang 0002, Pucktada Treeratpituk, Sarah M. Taylor, C. Lee Giles

Sciweavers

Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization

COLING 2010 | Computational Linguistics | Coreference | Documents | Web Documents |

Explore & Download

Productivity Tools

Sciweavers