The EPAC Corpus: Manual and Automatic Annotations of Conversational Speech in French Broadcast News

15 years 8 months ago

Download www.lrec-conf.org

This paper presents the EPAC corpus which is composed by a set of 100 hours of conversational speech manually transcribed and by the outputs of automatic tools (automatic segmentation, transcription, POS tagging, etc.) applied on the entire French ESTER 1 audio corpus: this concerns about 1700 hours of audio recordings from radiophonic shows. This corpus was built during the EPAC project funded by the French Research Agency (ANR) from 2007 to 2010. This corpus increases significantly the amount of French manually transcribed audio recordings easily available and it is now included as a part of the ESTER 1 corpus in the ELRA catalog without additional cost. By providing a large set of automatic outputs of speech processing tools, the EPAC corpus should be useful to researchers who want to work on such data without having to develop and deal with such tools. These automatic annotations are various: segmentation and speaker diarization, one-best hypotheses from the LIUM automatic speech ...

Yannick Estève, Thierry Bazillon, Jean-Yves

Real-time Traffic

Audio Recording | Automatic Tools | Education | EPAC Corpus | LREC 2010 |

claim paper

» Genre effects on automatic sentence segmentation of speech A comparison of broadcast news ...

» Onesided measures for evaluating ranked retrieval effectiveness with spontaneous conversat...

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Yannick Estève, Thierry Bazillon, Jean-Yves Antoine, Frédéric Béchet, Jérôme Farinas

Comments (0)

Sciweavers

The EPAC Corpus: Manual and Automatic Annotations of Conversational Speech in French Broadcast News

Audio Recording | Automatic Tools | Education | EPAC Corpus | LREC 2010 |

Explore & Download

Productivity Tools

Sciweavers