Face-to-face interaction between people is generally effortless and effective. We exchange glances, take turns speaking and make facial and manual gestures to achieve the goals of the dialogue. This paper describes an action composition and selection architecture of characters capable of full-duplex, real-time face-to-face interaction with a human. This architecture is part of a computational model of psychosocial dialogue skills, called †mir, that bridges between multimodal perception and multimodal action generation. To test the architecture, a prototype humanoid has been implemented, named Gandalf, who commands a graphical model of the solar system, and can engage in task-directed dialogue with people using speech, manual and facial gesture. Gandalf has been tested in interaction with users and has been shown capable of fluid turn-taking and multimodal dialogue. The primary focus in this paper will be on the action selection mechanisms and low-level composition of motor commands. ...
Kristinn R. Thórisson