Learning Joint Statistical Models for Audio-Visual Fusion and Segregation

15 years 8 months ago

Download www.eecs.berkeley.edu

People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a lowlevel, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a ...

John W. Fisher III, Trevor Darrell, William T. Fre

Real-time Traffic

Complex Signal Relationships | Faces Severe Challenges | Joint Distribution | NIPS 2000 | NIPS 2007 |

claim paper

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	2000
Where	NIPS
Authors	John W. Fisher III, Trevor Darrell, William T. Freeman, Paul A. Viola

Sciweavers

Learning Joint Statistical Models for Audio-Visual Fusion and Segregation

Complex Signal Relationships | Faces Severe Challenges | Joint Distribution | NIPS 2000 | NIPS 2007 |

Explore & Download

Productivity Tools

Sciweavers