

Restoring punctuation and capitalization in transcribed speech

14 years 8 months ago
Restoring punctuation and capitalization in transcribed speech
Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n = 3 to n = 6) and the amount of training data (from 58 million to 55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much.
Agustín Gravano, Martin Jansche, Michiel Ba
Added 21 May 2010
Updated 21 May 2010
Type Conference
Year 2009
Authors Agustín Gravano, Martin Jansche, Michiel Bacchiani
Comments (0)