Formatting Time-Aligned ASR Transcripts for Readability

15 years 4 months ago

Download www.aclweb.org

We address the problem of formatting the output of an automatic speech recognition (ASR) system for readability, while preserving wordlevel timing information of the transcript. Our system enriches the ASR transcript with punctuation, capitalization and properly written dates, times and other numeric entities, and our approach can be applied to other formatting tasks. The method we describe combines hand-crafted grammars with a class-based language model trained on written text and relies on Weighted Finite State Transducers (WFSTs) for the preservation of start and end time of each word.

Maria Shugrina

Real-time Traffic

ASR Transcript | Class-based Language Model | Computational Linguistics | Finite State Transducers | NAACL 2010 |

claim paper

Added	14 Feb 2011
Updated	14 Feb 2011
Type	Journal
Year	2010
Where	NAACL
Authors	Maria Shugrina

Sciweavers

Formatting Time-Aligned ASR Transcripts for Readability

ASR Transcript | Class-based Language Model | Computational Linguistics | Finite State Transducers | NAACL 2010 |

Explore & Download

Productivity Tools

Sciweavers