Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies

15 years 7 months ago

Download papers.ldc.upenn.edu

This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out as separate projects that were dispersed both geographically and chronologically. The TDT2 corpus has also received a variety of annotations, but all directly created or managed by a core group. In both cases, issues arise involving the propagation of repairs, consistency of references, and the ability to integrate annotations having different formats and levels of detail. We describe a general framework whereby these issues can be addressed successfully.

David Graff, Steven Bird

Real-time Traffic

Annotations | CORR 2000 | Distinct Annotations | Education | Switchboard Corpus |

claim paper

Post Info
More Details (n/a)

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2000
Where	CORR
Authors	David Graff, Steven Bird

Comments (0)

Sciweavers

Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies

Annotations | CORR 2000 | Distinct Annotations | Education | Switchboard Corpus |

Explore & Download

Productivity Tools

Sciweavers