This paper describes experiments in automatic recognition of context-independent phoneme strings from meeting data using audiovisual features. Visual features are known to improve ...
Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of fea...
Abdel-rahman Mohamed, Tara N. Sainath, George Dahl...
This contribution presents a wideband (50 Hz – 7 kHz) speech enhancement system that is operating in the frequency domain. As a novel feature, techniques known from artificial ...
Thomas Esch, Florian Heese, Bernd Geiser, Peter Va...
Defining suitable features for environmental sounds is an important problem in an automatic acoustic scene recognition system. As with most pattern recognition problems, extracti...
Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore th...