Sciweavers

CIKM
2010
Springer

CiteData: a new multi-faceted dataset for evaluating personalized search performance

13 years 11 months ago
CiteData: a new multi-faceted dataset for evaluating personalized search performance
Personalized search systems have evolved to utilize heterogeneous features including document hyperlinks, category labels in various taxonomies and social tags in addition to free-text of the documents. Consequently, classifiers, PageRank algorithms and Collaborative Filtering methods are often used as intermediate steps in such personalized retrieval systems. Thorough comparative evaluation of such complex systems has been difficult due to the lack of appropriate publicly available datasets that provide such diverse feature sets. To remedy the situation, we have created CiteData, a new dataset for benchmark evaluations of personalized search performance, that will be made publicly accessible. CiteData is a collection of academic articles extracted from CiteULike and CiteSeer repositories, with rich feature sets such as authors, author-affiliations, topic labels, social tags and citation information. We further supplement it with personalized queries and relevance judgments which wer...
Abhay Harpale, Yiming Yang, Siddharth Gopal, Daqin
Added 24 Jan 2011
Updated 24 Jan 2011
Type Journal
Year 2010
Where CIKM
Authors Abhay Harpale, Yiming Yang, Siddharth Gopal, Daqing He, Zhen Yue
Comments (0)