—A user’s focus of attention plays an important role in human–computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user’s goal and intent have to be continuously monitored. In this paper, we are interested in modeling people’s focus of attention in a meeting situation. We propose to model participants’ focus of attention from multiple cues. We have developed a system to estimate participants’ focus of attention from gaze directions and sound sources. We employ an omnidirectional camera to simultaneously track participants’ faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants’ focus of attention from acoustic and visual information separately. The system then combines the output of the audioand video-based focus of attention predictors. We have evaluated the system using the data from three r...