Do we Still Need Gold Standards for Evaluation?

15 years 8 months ago

Download www.lrec-conf.org

The availability of a huge mass of textual data in electronic format has increased the need for fast and accurate techniques for textual data processing. Machine learning and statistical approaches have been increasingly used in NLP since the 1990s, mainly because they are quick, versatile and efficient. However, despite this evolution of the field, evaluation still rely (most of the time) on a comparison between the output of a probabilistic or statistical system on the one hand, and a non-statistic, most of the time hand-crafted, gold standard on the other hand. In order to be able to compare these two sets of data, which are inherently of a different nature, it is first necessary to modify the statistical data so that they fit with the hand-crafted reference. For example, a statistical parser, instead of producing a score of grammaticality, will have to produce a binary value for each sentence (grammatical vs ungrammatical) or a tree similar to the one stored in the treebank used a...

Thierry Poibeau, Cédric Messiant

Real-time Traffic