This work extends and improves a recently introduced (Dec. 2007) dynamic Bayesian network (DBN) based audio-visual automatic speech recognition (AVASR) system. That system models ...
Labeling video data is an essential prerequisite for many vision applications that depend on training data, such as visual information retrieval, object recognition, and human act...
Extracting key-frames is the first step for efficient content-based indexing, browsing and retrieval of the video data in commercial movies. Most of the existing research deals wi...
Spoken Document Retrieval (SDR) is a promising technology for enhancing the utility of spoken materials. After the spoken documents have been transcribed by using a Large Vocabula...
ion when we annotate content. This therefore requires us to investigate and model video semantics. Because of the type and volume of data, general-purpose approaches are likely to ...