The production of closed captions is an important but expensive process in video broadcasting. We propose a method to generate highly accurate off-line captions efficiently. Our s...
We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that first employs discriminative detection of ...
Kate Saenko, Karen Livescu, Michael Siracusa, Kevi...
Texts generated by automatic speech recognition (ASR) systems have some specificities, related to the idiosyncrasies of oral productions or the principles of ASR systems, that mak...
In the past decade great technological advances have been made in internet services, personal computers, telecommunications, media and entertainment. Many of these advances have be...
In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both...
Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng...