Structured entity identification and document categorization: two tasks with one joint model

16 years 7 months ago

Download www.godbole.net

Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two tasks have much to gain from each other. Apart from direct references to entities in a database, such as names of person entities, documents often also contain words that are correlated with discriminative entity attributes, such age-group and income-level of persons. This happens naturally in many enterprise domains such as CRM, Banking, etc. Then, entity identification, which is typically vulnerable against noise and incompleteness in direct references to entities in documents, can benefit from document categorization with respect to such attributes. In return, entity identification enables documents to be categorized according to different label-sets arising from entity attributes without requiring any supervision. In this paper, we propose a probabilistic generative model for joint entity identification an...

Indrajit Bhattacharya, Shantanu Godbole, Sachindra

Real-time Traffic

Data Mining | Document Categorization | Entity Identification | Joint Entity Identification | KDD 2008 |

claim paper

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2008
Where	KDD
Authors	Indrajit Bhattacharya, Shantanu Godbole, Sachindra Joshi

Sciweavers

Structured entity identification and document categorization: two tasks with one joint model

Data Mining | Document Categorization | Entity Identification | Joint Entity Identification | KDD 2008 |

Explore & Download

Productivity Tools

Sciweavers