We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new...
In this work we present a novel multi-modal mixed-state dynamic Bayesian network (DBN) for robust meeting event classification. The model uses information from lapel microphones,...
The goal of the speech segments extraction process is to separate acoustic events of interest (the speech segment to be recognised) in a continuously recorded signal from other par...
In many social settings, images of groups of people are captured. The structure of this group provides meaningful context for reasoning about individuals in the group, and about th...