Indexing audio signals directly in the transform domain can potentially save a significant amount of computation when working on a large database of signals stored in a lossy compr...
To solve the knowledge bottleneck problem, active learning has been widely used for its ability to automatically select the most informative unlabeled examples for human annotation...
Jingbo Zhu, Huizhen Wang, Benjamin K. Tsou, Matthe...
We present an approach to music identification based on weighted finite-state transducers and Gaussian mixture models, inspired by techniques used in large-vocabulary speech recogn...
Abstract--We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of sourc...
Multiple pitch estimation consists of estimating the fundamental frequencies and saliences of pitched sounds over short time frames of an audio signal. This task forms the basis of...
Knowing the orientation of a talker in the focal area of a large-aperture microphone array enables the development of better beamforming algorithms (to obtain higher-quality speech...
Abstract--This paper considers a psychoacoustically constrained and distortion minimized speech enhancement algorithm. Noise reduction, in general, leads to speech distortion, and ...
We consider the problem of extracting the source signals from an under-determined convolutive mixture assuming known mixing filters. State-of-the-art methods operate in the time-fr...
Existing binaural approaches to speech segregation place an exclusive burden on cues related to the location of sound sources in space. These approaches can achieve excellent perfo...
Various methods have recently appeared to transform foreign-accented speech into its native-accented counterpart. Evaluation of these accent conversion methods requires extensive l...