The vocabulary of the TREC Legal OCR collection is noisy and huge. Standard techniques for improving retrieval performance such as content-based query expansion are ineffective fo...
Recent advances in information retrieval over hyperlinked corpora have convincinglydemonstratedthat links carry less noisy information than text. We investigate the feasibility of...
Automatic extraction of semantic information from text and links in Web pages is key to improving the quality of search results. However, the assessment of automatic semantic meas...
Ana Gabriela Maguitman, Filippo Menczer, Heather R...
In participating in this CLEF evaluation campaign, our first objective is to propose and evaluate various indexing and search strategies for the Russian language, in order to obta...
Enterprise corpora contain evidence of what employees work on and therefore can be used to automatically find experts on a given topic. We present a general approach for represen...