Speech recognition is usually based on Hidden Markov Models (HMMs), which represent the temporal dynamics of speech very efficiently, and Gaussian mixture models, which do non-opt...
This paper presents a novel Gaussianized vector representation for scene images by an unsupervised approach. First, each image is encoded as an ensemble of orderless bag of featur...
Hao Tang, Mark Hasegawa-Johnson, Thomas S. Huang, ...
In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both...
Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng...
—We propose a statistical framework for high-level feature extraction that uses SIFT Gaussian mixture models (GMMs) and audio models. SIFT features were extracted from all the im...
Visual interpretation of events requires both an appropriate representation of change occurring in the scene and the application of semantics for differentiating between different...