In emotion recognition, a widely-used method to reconciliate disagreement between multiple human evaluators is to perform majority-voting on their assigned class labels. Instead, we propose asking evaluators to rank emotional categories given an audio clip, followed by a combination of these ranked lists. We compare two well-known ranked list voting methods - Borda count and Schulze’s method, with majority-voting and an evaluator model-based combination of the top ranked-labels. When tested on an emotional speech database with ground truth labels available, two interesting observations emerge. First, majority-voting performs significantly worse than the other three methods in the estimation of the given ground truth labels. Second, when performing classification using the combined labels, the two ranked list voting methods perform the best. We then propose evaluator reliability-weighted versions of these two methods, which improve the classification accuracy even further.
Kartik Audhkhasi, Shrikanth S. Narayanan