For bounded datasets such as the TREC Web Track (WT10g) the computation of term frequency (TF) and inverse document frequency (IDF) is not difficult. However, when the corpus is th...
Abstract. Retrieving semistructured (XML) data typically requires either a structured query such as XPath, or a keyword query that does not take structure into account. In this pap...
Ranking for multilingual information retrieval (MLIR) is a task to rank documents of different languages solely based on their relevancy to the query regardless of query’s langu...
We present an account of designing and evaluating a university-wide expert search engine. We performed system-based evaluation to determine the optimal retrieval settings and an ex...
Abstract. When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the pla...
Evaluating complex system is a complex task. Evaluation campaigns are organized each year to test different systems on global results, but they do not evaluate the relevance of th...
Blog post opinion retrieval is the problem of identifying posts which express an opinion about a particular topic. Usually the problem is solved using a 3 step process in which rel...
Understanding the intent underlying user queries may help personalize search results and improve user satisfaction. In this paper, we develop a methodology for using ad clickthroug...
Azin Ashkan, Charles L. A. Clarke, Eugene Agichtei...
We explore the utility of different types of topic models for retrieval purposes. Based on prior work, we describe several ways that topic models can be integrated into the retrie...