The focus of this paper is on how to select a small sample of examples for labeling that can help us to evaluate many different classification models unknown at the time of sampl...
In this paper, we propose a novel top-k learning to rank framework, which involves labeling strategy, ranking model and evaluation measure. The motivation comes from the difficul...
Abstract—Community detection or cluster detection in networks is a well-studied, albeit hard, problem. Given the scale and complexity of modern day social networks, detecting “...
Yang Yang, Yizhou Sun, Saurav Pandit, Nitesh V. Ch...
Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is highly sensitive to problematic annotations. We s...
Roy Schwartz, Omri Abend, Roi Reichart, Ari Rappop...
Abstract. A useful ability for search engines is to be able to rank objects with novelty and diversity: the top k documents retrieved should cover possible interpretations of a que...
— One of the central issues in Learning to Rank (L2R) for Information Retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures ...
We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 ...
Dragomir R. Radev, Simone Teufel, Horacio Saggion,...
Most state-of-the-art evaluation measures for machine translation assign high costs to movements of word blocks. In many cases though such movements still result in correct or alm...
Evaluation measures play an important role in machine learning because they are used not only to compare different learning algorithms, but also often as goals to optimize in cons...
We describe the WebCLEF 2008 task. Similarly to the 2007 edition of WebCLEF, the 2008 edition implements a multilingual "information synthesis" task, where, for a given t...