Generating Training Data for Medical Dictations

15 years 8 months ago

Download acl.ldc.upenn.edu

In automatic speech recognition (ASR) enabled applications for medical dictations, corpora of literal transcriptions of speech are critical for training both speaker independent and speaker adapted acoustic models. Obtaining these transcriptions is both costly and time consuming. Non-literal transcriptions, on the other hand, are easy to obtain because they are generated in the normal course of a medical transcription operation. This paper presents a method of automatically generating texts that can take the place of literal transcriptions for training acoustic and language models. ATRS1 is an automatic transcription reconstruction system that can produce near-literal transcriptions with almost no human labor. We will show that (i) adapted acoustic models trained on ATRS data perform as well as or better than adapted acoustic models trained on literal transcriptions (as measured by recognition accuracy) and (ii) language models trained on ATRS data have lower perplexity than language ...

Sergey V. Pakhomov, Michael Schonwetter, Joan Bach

Real-time Traffic

Acoustic Models | Language Models | Literal Transcriptions | NAACL 2001 | NAACL 2007 |

claim paper

» Automatic Generation of Training Data for Brain Tissue Classification from MRI

» VascuSynth Simulating vascular trees for generating volumetric image data with groundtruth...

» Learning to improve areaunderFROC for imbalanced medical data classification using an ense...

» Costminimising strategies for data labelling optimal stopping and active learning

» Multifigure Anatomical Objects for Shape Statistics

» A Pervasive Computing System for the Operating Room of the Future

» AutoExtraction Representation and Integration of a Diabetes Ontology Using Bayesian Networ...

» Realtime insitu visual feedback of task performance in mixed environments for learning joi...

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	NAACL
Authors	Sergey V. Pakhomov, Michael Schonwetter, Joan Bachenko

Comments (0)

Sciweavers

Generating Training Data for Medical Dictations

Acoustic Models | Language Models | Literal Transcriptions | NAACL 2001 | NAACL 2007 |

Explore & Download

Productivity Tools

Sciweavers