We examine in some detail Mel Frequency Cepstral Coefficients (MFCCs) - the dominant features used for speech recognition - and investigate their applicability to modeling music. ...
The SRI speaker recognition system for the 2010 NIST speaker recognition evaluation (SRE) incorporates multiple subsystems with a variety of features and modeling techniques. We d...
Nicolas Scheffer, Luciana Ferrer, Martin Graciaren...
We present MARS (Multilingual Automatic tRanslation System), a research prototype speech-to-speech translation system. MARS is aimed at two-way conversational spoken language trans...
Yuqing Gao, Bowen Zhou, Zijian Diao, Jeffrey S. So...
We introduce Bayesian sensing hidden Markov models (BS-HMMs) to represent speech data based on a set of state-dependent basis vectors. By incorporating the prior density of sensin...
We introduce a direct model for speech recognition that assumes an unstructured, i.e., flat text output. The flat model allows us to model arbitrary attributes and dependences o...
Georg Heigold, Geoffrey Zweig, Xiao Li, Patrick Ng...