We explored the reliability of detecting a learner's affect from conversational features extracted from interactions with AutoTutor, an intelligent tutoring system that helps students learn by holding a conversation in natural language. Training data were collected in a learning session with AutoTutor, after which the affective states of the learner were rated by the learner, a peer, and two trained judges. Inter-rater reliability scores indicated that the classifications of the trained judges were more reliable than the novice judges. Seven data sets that temporally integrated the affective judgments with the dialogue features of each learner were constructed. The first four datasets corresponded to the judgments of the learner, a peer, and two trained judges, while the remaining three data sets combined judgments of two or more raters. Multiple regression analyses confirmed the hypothesis that dialogue features could significantly predict the affective states of boredom, confus...
Sidney K. D'Mello, Scotty D. Craig, Amy M. Withers