The desire for definitive data and the semantic web drive for inference over heterogeneous data sources requires co-reference resolution to be performed on those data. In particular, name disambiguation is required to allow accurate publication lists, citation counts and impact measures to be determined. This paper describes a graph-based approach to author disambiguation on large-scale citation networks. Using self-citation, co-authorship and document source analyses, AKTiveAuthor clusters papers, achieving precision of 0.997 and recall of 0.818 over a test group of eight surname clusters. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval General Terms Algorithms Keywords Name Disambiguation, Self-Citation, Metadata Analysis
Duncan M. McRae-Spencer, Nigel R. Shadbolt