We present a robust and portable visual-based skin and face detection system developed for use in a multiple speaker teleconferencing system, employing both audio and video cues. An omni-directional video sensor is used to provide a view of the entire visual hemisphere, thereby allowing for multiple dynamic views of all the participants. Regions of skin are detected using simple statistical methods, along with histogram color models for both skin and non-skin color classes. Regions of skin belonging to the same person are grouped together, and using simple spatial properties, the position of each person's face is inferred. Preliminary results suggest the system is capable of detecting human faces present in an omni-directional image despite the poor resolution inherent with such an omni-directional sensor.
Bill Kapralos, Michael R. M. Jenkin, Evangelos E.