CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation

14 years 29 days ago

Download www.lrec-conf.org

CASIA-CASSIL is a large-scale corpus base of Chinese human-human naturally-occurring telephone conversations in restricted domains. The first edition consists of 792 90-second conversations belonging to tourism domain, which are selected from 7,639 spontaneous telephone recordings in real scenarios. The corpus is now being annotated with wide range of linguistic and paralinguistic information in multi-levels. The annotations include Turns, Speaker Gender, Orthographic Transcription, Chinese Syllable, Chinese Phonetic Transcription, Prosodic Boundary, Stress of Sentence, Non-Speech Sounds, Voice Quality, Topic, Dialog-act and Adjacency Pairs, Ill-formedness, and Expressive Emotion as well, 13 levels in total. The abundant annotation will be effective especially for studying Chinese spoken language phenomena. This paper describes the whole process to build the conversation corpus, including collecting and selecting the original data, and the follow-up process such as transcribing, annot...

Keyan Zhou, Aijun Li, Zhigang Yin, Chengqing Zong

Real-time Traffic

Corpus Base | Education | Human-human Naturally-occurring Telephone | Large-scale Corpus Base | LREC 2010 |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Keyan Zhou, Aijun Li, Zhigang Yin, Chengqing Zong

Comments (0)

Sciweavers

CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation

Corpus Base | Education | Human-human Naturally-occurring Telephone | Large-scale Corpus Base | LREC 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers