This paper investigates the agreement of relevance assessments between official TREC judgments and those generated from an interactive IR experiment. Results show that 63% of docu...
Assessors frequently disagree on the topical relevance of documents. How much of this disagreement is due to ambiguity in assessment instructions? We have two assessors assess TRE...
Information retrieval evaluation has typically been performed over several dozen queries, each judged to near-completeness. There has been a great deal of recent work on evaluatio...
Ben Carterette, Virgiliu Pavlu, Evangelos Kanoulas...
There has been recent interest in collecting user or assessor preferences, rather than absolute judgments of relevance, for the evaluation or learning of ranking algorithms. Since...
Information retrieval experimentation generally proceeds in a cycle of development, evaluation, and hypothesis testing. Ideally, the evaluation and testing phases should be short ...