Disambiguating authors in academic publications using random forests

16 years 1 months ago

Download clgiles.ist.psu.edu

Users of digital libraries usually want to know the exact author or authors of an article. But diﬀerent authors may share the same names, either as full names or as initials and last names (complete name change examples are not considered here). In such a case, the user would like the digital library to diﬀerentiate among these authors. Name disambiguation can help in many cases; one being a user in a search of all articles written by a particular author. Disambiguation also enables better bibliometric analysis by allowing a more accurate counting and grouping of publications and citations. In this paper, we describe an algorithm for pairwise disambiguation of author names based on a machine learning classiﬁcation algorithm, random forests. We deﬁne a set of similarity proﬁle features to assist in author disambiguation. Our experiments on the Medline database show that the random forest model outperforms other previously proposed techniques such as those using support-vector...

Pucktada Treeratpituk, C. Lee Giles

Real-time Traffic

Author Disambiguation | Disambiguation | Education | JCDL 2009 | Random Forests |

claim paper

Added	28 May 2010
Updated	28 May 2010
Type	Conference
Year	2009
Where	JCDL
Authors	Pucktada Treeratpituk, C. Lee Giles

Sciweavers

Disambiguating authors in academic publications using random forests

Author Disambiguation | Disambiguation | Education | JCDL 2009 | Random Forests |

Explore & Download

Productivity Tools

Sciweavers