Restoring punctuation and capitalization in transcribed speech

15 years 8 months ago

Download symptotic.com

Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n = 3 to n = 6) and the amount of training data (from 58 million to 55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much.

Agustín Gravano, Martin Jansche, Michiel Ba

Real-time Traffic

Automatic Speech Transcripts | ICASSP 2009 | N-gram Order | Signal Processing | Text-based N-gram Language |

claim paper

Post Info
More Details (n/a)

Added	21 May 2010
Updated	21 May 2010
Type	Conference
Year	2009
Where	ICASSP
Authors	Agustín Gravano, Martin Jansche, Michiel Bacchiani

Comments (0)

Sciweavers

Restoring punctuation and capitalization in transcribed speech

Automatic Speech Transcripts | ICASSP 2009 | N-gram Order | Signal Processing | Text-based N-gram Language |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers