In a video-conference the participants usually see the video of the speaker. However if somebody reacts (e. g. nodding) the system should switch to his video. Current systems do not support this. We formulate this camera selection as a pattern recognition problem. Then we apply HMMs to learn this behaviour. Thus our system can easily be adapted to different meeting scenarios. Furthermore, while current systems stay on the speaker, our system will switch if somebody reacts. In an experimental section we show that – compared to a desired output – a current system shows the wrong camera more than half of the time (frame error rate 53%), where our system selects the wrong camera in only a quarter of the time (FER 27%).