Sciweavers

CLEAR
2007
Springer

Shared Linguistic Resources for the Meeting Domain

14 years 5 months ago
Shared Linguistic Resources for the Meeting Domain
This paper describes efforts by the University of Pennsylvania's Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Spring 2007 (RT-07) Rich Transcription Meeting Recognition Evaluation. In addition to making available large volumes of training data to research participants, LDC produced reference transcripts for the NIST Phase II Corpus and RT-07 conference room evaluation set, which represent a variety of subjects, scenarios and recording conditions. For the 18-hour NIST Phase II Corpus, LDC created quick transcripts which include automatic segmentation and minimal markup. The 3-hour evaluation corpus required the creation of careful verbatim reference transcripts including manual segmentation and rich markup. The 2007 effort marked the second year of using the XTrans annotation tool kit in the meeting domain. We describe the process of creating transcripts for the RT-07 eva...
Meghan Lammie Glenn, Stephanie Strassel
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CLEAR
Authors Meghan Lammie Glenn, Stephanie Strassel
Comments (0)