Abstract. We present a neural architecture for gesture-based interaction between a mobile robot and human users. One crucial problem for natural interface techniques is the robustness under highly varying environmental conditions. Therefore, we propose a multiple cue approach for the localisation of a potential user in the operation eld, followed by the aquisition and interpretaion of its gestural instructions. The whole approach is motivated in the context of a reliable operation scenario, but can be extended easily for other applications, such as videoconferencing.