Interactivity is a key concept in modern content-based retrieval. Therefore, in addition to the ability to learn from user generated data, easy and intuitive to use interfaces are an important area of research in (multi)media retrieval. In this contribution, we focus on the latter aspect and present how different modalities like speech and gestures on super sized touch screen facilities may be integrated to accomplish the goal of intuitive interaction. In order to evaluate our approach, we conducted a series of usability experiments. Their results demonstrate that our multimodal user interface allows for both, comfortable and successful interactive image retrieval.