Generative models for name disambiguation

16 years 7 months ago

Download www2007.org

Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with other people. In this paper, we present an efficient framework by using two novel topic-based models, extended from Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. Experiments indicate that our approach consistently outperforms other unsupervised methods including spectral and DBSCAN clustering. Scalability is addressed by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval General Terms Algorithms, Experimentation, Theory Keywords Unsupervised Machine Learning, Name Disambiguation.

Yang Song, Jian Huang 0002, Isaac G. Councill, Jia

Real-time Traffic

Internet Technology | Keywords Unsupervised Machine | Latent Dirichlet Allocation | Latent Semantic Analysis | WWW 2007 |

claim paper

» Web Person Name Disambiguation by Relevance Weighting of Extended Feature Sets

» Using Encyclopedic Knowledge for Named entity Disambiguation

» SyGAR A Synthetic Data Generator for Evaluating Name Disambiguation Methods

» A hierarchical naive Bayes mixture model for name disambiguation in author citations

» Citation data clustering for author name disambiguation

» A unified framework for name disambiguation

» AZuRE a Scalable System for Automated Term Disambiguation of Gene and Protein Names

» Classifying Relations for Biomedical Named Entity Disambiguation

Post Info
More Details (n/a)

Added	21 Nov 2009
Updated	21 Nov 2009
Type	Conference
Year	2007
Where	WWW
Authors	Yang Song, Jian Huang 0002, Isaac G. Councill, Jia Li, C. Lee Giles

Comments (0)

Sciweavers

Generative models for name disambiguation

Internet Technology | Keywords Unsupervised Machine | Latent Dirichlet Allocation | Latent Semantic Analysis | WWW 2007 |

Explore & Download

Productivity Tools

Sciweavers