Sciweavers

ICASSP
2010
IEEE

Spoken language translation from parallel speech audio: Simultaneous interpretation as SLT training data

13 years 11 months ago
Spoken language translation from parallel speech audio: Simultaneous interpretation as SLT training data
In recent work, we proposed an alternative to parallel text as translation model (TM) training data: audio recordings of parallel speech (pSp), as it occurs in any communication scenario where interpreters are involved. Although interpretation compares poorly to translation, we reported surprisingly strong translation results for systems based on pSp trained TMs. This work extends the use of pSp as a data source for unsupervised training of all major models involved in statistical spoken language translation. We consider the scenario of speech translation between a resource rich and a resource-deficient language. Our seed models are based on 10h of transcribed audio and parallel text comprised of 100k translated words. With the help of 92h of untranscribed pSp audio, and by taking advantage of the redundancy inherent to pSp (the same information is given twice, in two languages), we report significant improvements for the resourcedeficient acoustic, language and translation models....
Matthias Paulik, Alex Waibel
Added 25 Jan 2011
Updated 25 Jan 2011
Type Journal
Year 2010
Where ICASSP
Authors Matthias Paulik, Alex Waibel
Comments (0)