: If multiple evaluators analyse the outcomes of a single user test, the agreement between their lists of identified usability problems tends to be limited. This is called the ‘evaluator effect’. In the present paper, three user tests, taken from various domains, are reported and evaluator effects were measured. In all three studies, the evaluator effect proved to be less than in Jacobsen et al.'s (1998) study, but still present. Through detailed analysis of the data, it was possible to identify various causes for the evaluator effect, ranging from inaccuracies in logging and mishearing verbal utterances to differences in interpreting user intentions. Suggested strategies for managing the evaluator effect are: doing a systematic and detailed data analysis with automated logging, discussing specific usability problems with other evaluators, and having the entire data analysis done by multiple evaluators.
Arnold P. O. S. Vermeeren, Ilse van Kesteren, Math