In this paper, the development of a framework based on the Realtime Database (RTDB) for processing multimodal data is presented. This framework allows readily integration of input and output modules. Furthermore the asynchronous data streams from different sources can be approximately processed in a synchronous manner. Depending on the included modules, online as well as offline data processing is possible. The idea is to establish a real multimodal interaction system that is able to recognize and react to those situations that are relevant for human-robot interaction.1