Statistical learning methods are commonly applied in content-based video and image retrieval. Such methods require a large number of examples which are usually obtained through a manual annotation process, that is human raters review images and assign semantic concept labels. The human judgement, however, cannot be regarded as the ultimate truth because of its subjectiveness and the likelihood of human error. We can address these issues by using multiple judgements per example, but evaluating and resolving disagreement between raters is problematic. Moreover, the nature of rater disagreement and how to minimise it are not yet well explored. In this paper we present results of a user study that was specifically designed to investigate human judgement of digital imagery. We discuss the influence of factors such as size and type of semantic vocabulary on inter-rater agreement. We demonstrate the application of latent class analysis for combining multiple judgements. Known from applicat...
Timo Volkmer, James A. Thom, Seyed M. M. Tahaghogh