We propose in this paper an automatic evaluation procedure based on a metric which could provide summary evaluation without human assistance. Our system includes two metrics, which are presented and discussed. The first metric is based on a known and powerful statistical test, the 2 goodness-of-fit test, and has been used in several applications. The second metric is derived from three common metrics used to evaluate Natural Language Processing (NLP) systems, namely precision, recall and f-measure. The combination of these two metrics is intended to allow one to assess the quality of summaries quickly, cheaply and without the need of human intervention, minimizing though, the role of subjective judgment and bias.
Paulo C. F. de Oliveira, Edson Wilson Torrens, Ale