This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
We present the user evaluation of two recommendation server methodologies implemented for the NASA Technical Report Server (NTRS). One methodology for generating recommendations u...
Michael L. Nelson, Johan Bollen, JoAnne R. Calhoun...
Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques ...
Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely f...
Wisam Dakka, Luis Gravano, Panagiotis G. Ipeirotis
As the number of non-English documents is increasing dramatically on the web nowadays, the study and design of information retrieval systems for these languages is very important....
Abolfazl AleAhmad, Hadi Amiri, Masoud Rahgozar, Fa...