In the context of human-computer interaction, information about head pose is an important cue for building a statement about humans’ focus of attention. In this paper, we present an approach to estimate horizontal head rotation of people inside a smart-room. This room is equipped with multiple cameras that aim to provide at least one facial view of the user at any location in the room. We use neural networks that were trained on samples of rotated heads in order to classify each camera view. Whenever there is more than one estimate of head rotation, we combine the different estimates into one joint hypothesis. We show experimentally, that by using the proposed combination scheme, the mean error for unknown users could be reduced by up to 50% when combining the estimates from multiple cameras.