In this paper we present a method for estimating a person's head pose with a stereo camera. Our approach focuses on the application of human-robot interaction, where people may be further away from the camera and move freely around in a room. We show that depth information acquired from a stereo camera not only helps improving the accuracy of the pose estimation, but also improves the robustness of the system when the lighting conditions change. The estimation is based on neural networks, which are trained to compute the head pose from grayscale and disparity images of the stereo camera. It can handle pan and tilt rotations from -90 to +90 . Our system doesn't require any manual initialization and doesn't suffer from drift during an image sequence. Moreover the system is capable of real-time processing.