Multimodal Speaker Detection Using Error Feedback Dynamic Bayesian Networks

15 years 6 months ago

Download www.cc.gatech.edu

Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allow the power of statistical inference and learning to be combined with contextual knowledge of the problem. Unfortunately, simple learning methods can cause such appealing models to fail when the data exhibits complex behavior. We formulate a learning framework for DBNs based on error-feedback and statistical boosting theory. We apply this framework to the problem of audio/visual speaker detection in an interactive kiosk environment using "off-theshelf" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors). Detection results obtained in this setup demonstrate superiority of our learning framework over that of the c...

Vladimir Pavlovic, James M. Rehg, Ashutosh Garg, T

Real-time Traffic

Ambiguous Multi-sensory Data | Audio/visual Speaker Detection | Computer Vision | CVPR 2000 | Setup Demonstrate Superiority | Simple Learning Methods | Statistical Boosting Theory |

claim paper

Post Info
More Details (n/a)

Added	12 Oct 2009
Updated	30 Oct 2009
Type	Conference
Year	2000
Where	CVPR
Authors	Vladimir Pavlovic, James M. Rehg, Ashutosh Garg, Thomas S. Huang

Comments (0)

Sciweavers

Multimodal Speaker Detection Using Error Feedback Dynamic Bayesian Networks

Ambiguous Multi-sensory Data | Audio/visual Speaker Detection | Computer Vision | CVPR 2000 | Setup Demonstrate Superiority | Simple Learning Methods | Statistical Boosting Theory |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers