A large body of prior research on coreference resolution recasts the problem as a two-class classification problem. However, standard supervised machine learning algorithms that m...
Empirical performance evaluation is the process of measuring and calculating performance metrics of deployed software systems. It is a part of performance validation during testin...
Translation systems are generally trained to optimize BLEU, but many alternative metrics are available. We explore how optimizing toward various automatic evaluation metrics (BLEU...
Daniel Cer, Christopher D. Manning, Daniel Jurafsk...
We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPO...
This paper present a set metrics for evaluating the operative aspects of the E-Government SOA systems, based on technical and economical criteria as they are intended to improve ma...