CiteData: a new multi-faceted dataset for evaluating personalized search performance

13 years 11 months ago

Download nyc.lti.cs.cmu.edu

Personalized search systems have evolved to utilize heterogeneous features including document hyperlinks, category labels in various taxonomies and social tags in addition to free-text of the documents. Consequently, classiﬁers, PageRank algorithms and Collaborative Filtering methods are often used as intermediate steps in such personalized retrieval systems. Thorough comparative evaluation of such complex systems has been diﬃcult due to the lack of appropriate publicly available datasets that provide such diverse feature sets. To remedy the situation, we have created CiteData, a new dataset for benchmark evaluations of personalized search performance, that will be made publicly accessible. CiteData is a collection of academic articles extracted from CiteULike and CiteSeer repositories, with rich feature sets such as authors, author-aﬃliations, topic labels, social tags and citation information. We further supplement it with personalized queries and relevance judgments which wer...

Abhay Harpale, Yiming Yang, Siddharth Gopal, Daqin

Real-time Traffic

CIKM 2010 | Information Technology | Intermediate Steps | Personalized | Social Tags |

claim paper

Post Info
More Details (n/a)

Added	24 Jan 2011
Updated	24 Jan 2011
Type	Journal
Year	2010
Where	CIKM
Authors	Abhay Harpale, Yiming Yang, Siddharth Gopal, Daqing He, Zhen Yue

Comments (0)

Sciweavers

CiteData: a new multi-faceted dataset for evaluating personalized search performance

CIKM 2010 | Information Technology | Intermediate Steps | Personalized | Social Tags |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers