This paper proposes a method for separating the signals of individual musical instruments from monaural musical audio. The mixture signal is modeled as a sum of the spectra of ind...
Sparse coding networks, which utilize unsupervised learning to maximize coding efficiency, have successfully reproduced response properties found in primary visual cortex [1]. Ho...
William K. Coulter, Cristopher J. Hillar, Guy Isle...
We investigate usefulness of across-phone variability for speaker recognition in a joint factor analysis (JFA) framework. We estimate the variability as across-phone covariance wi...
This paper studies the influence of n-gram language models in the recognition of sung phonemes and words. We train uni-, bi-, and trigram language models for phonemes and bi- and...
This paper deals with the problem of under-determined convolutive blind source separation. We model the contribution of each source to all mixture channels in the time-frequency d...
In this paper, we present a novel approach to relax the constraint of stereo-data which is needed in a series of algorithms for noise-robust speech recognition. As a demonstration...
Nonnegative matrix factorization (NMF) is a widely-used tool for obtaining low-rank approximations of nonnegative data such as digital images, audio signals, textual data, financ...
We present a technique for following a live performance in the situation where a score is not available. Making use of a local alignment between recent and longer term musical inf...
In this paper, independent component analysis (ICA) in a subband domain has been extended into a feed-forward network. The feed-forward network maximizes mutual independence of se...