Sciweavers

229 search results - page 7 / 46
» Evaluation measures for preference judgments
Sort
View
CIKM
2003
Springer
14 years 1 months ago
Using titles and category names from editor-driven taxonomies for automatic evaluation
Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury...
BMCBI
2008
105views more  BMCBI 2008»
13 years 8 months ago
Objective and automated protocols for the evaluation of biomedical search engines using No Title Evaluation protocols
Background: The evaluation of information retrieval techniques has traditionally relied on human judges to determine which documents are relevant to a query and which are not. Thi...
Fabien Campagne
WSDM
2010
ACM
173views Data Mining» more  WSDM 2010»
14 years 5 months ago
Measuring the Reusability of Test Collections
While test collection construction is a time-consuming and expensive process, the true cost is amortized by reusing the collection over hundreds or thousands of experiments. Some ...
Ben Carterette, Evgeniy Gabrilovich, Vanja Josifov...
NAACL
2010
13 years 6 months ago
The Best Lexical Metric for Phrase-Based Statistical MT System Optimization
Translation systems are generally trained to optimize BLEU, but many alternative metrics are available. We explore how optimizing toward various automatic evaluation metrics (BLEU...
Daniel Cer, Christopher D. Manning, Daniel Jurafsk...
WWW
2009
ACM
14 years 9 months ago
Learning consensus opinion: mining data from a labeling game
We consider the problem of identifying the consensus ranking for the results of a query, given preferences among those results from a set of individual users. Once consensus ranki...
Paul N. Bennett, David Maxwell Chickering, Anton M...