This paper describes a study in which student-created diagrams about arguments in an ill-defined domain were manually graded by two independent human graders. Findings include that the graders overall agreed with each other on their grades, but their agreement was lower than one would expect in well-defined domains, and higher for solutions of extreme quality. Keywords. Argumentation, diagrams, inter-rater reliability, ill-defined domains
Niels Pinkwart, Collin Lynch, Kevin D. Ashley, Vin