In future videocommunication services, the user’s communication device, such as PC, laptop, PDA or mobile phone is equipped with new interaction modalities. These can be cameras and microphones on the capturing side and speech synthesis and video/3D graphics on the rendering side. Haptic and tactile interfaces become also available. These modalities help the user to interact more intuitive with complex devices and tools and provide new services. Hence, a key challenge of new modalities is robustness and stability under general conditions in arbitrary environments. Furthermore, inexperienced users should be able to use these new capabilities without dedicated knowledge of device settings or algorithms. In this paper, we will present some key components for a robust vision-based user interface, which are integrated in an advanced future videocommunication service.