Shared Linguistic Resources for the Meeting Domain

16 years 1 months ago

Download www.itl.nist.gov

This paper describes efforts by the University of Pennsylvania's Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Spring 2007 (RT-07) Rich Transcription Meeting Recognition Evaluation. In addition to making available large volumes of training data to research participants, LDC produced reference transcripts for the NIST Phase II Corpus and RT-07 conference room evaluation set, which represent a variety of subjects, scenarios and recording conditions. For the 18-hour NIST Phase II Corpus, LDC created quick transcripts which include automatic segmentation and minimal markup. The 3-hour evaluation corpus required the creation of careful verbatim reference transcripts including manual segmentation and rich markup. The 2007 effort marked the second year of using the XTrans annotation tool kit in the meeting domain. We describe the process of creating transcripts for the RT-07 eva...

Meghan Lammie Glenn, Stephanie Strassel

Real-time Traffic