Interobserver reliability and reproducibility are well known problems in experimental research within the social and behavioural sciences. We propose the use of formal techniques and tools to reduce this problem. To this end we extend standard research methods by transcribing video material in terms of basic score units, using automated tools to define in logic the more complex score units in terms of the basic score units, and to automatically check these complex score units against the transcripts. Furthermore, we use pilot experiments to determine the basic score units. We show that the proposed extension significantly improves interobserver reliability and reproducibility. An important additional benefit of our method is that the repository of annotations remains useful even if the researcher decides to test other complex score units that can be formulated in terms of the basic score units used to annotate the collected data.
Arjen van Alphen, Tibor Bosse, Catholijn M. Jonker