BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals

15 years 8 months ago

Download www.phonetik.uni-muenchen.de

A new type of language resource 'BAStat' has been released by the Bavarian Archive for Speech Signals. In contrast to primary resources like speech and text corpora BAStat comprises statistical estimates based on a number of primary resources: first and second order occurrence probability of phones, syllables and words, duration statistics, probabilities of pronunciation variants of words and probabilities of context information. Unlike other statistical speech resources BAStat is based solely on recordings of conversational German and therefore models spoken language. It consists of 7-bit ASCII tables and matrices to maximize inter-operability between different platforms and can be downloaded from the BAS web-site. This paper gives a detailed description about the empirical basis, the contained data types, some interesting interpretations and a brief comparison to the text-based statistical resource CELEX.

Florian Schiel

Real-time Traffic

Education | LREC 2010 | Primary Resources | Resource | Statistical Speech Resources |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Florian Schiel

Comments (0)

Sciweavers

BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals

Education | LREC 2010 | Primary Resources | Resource | Statistical Speech Resources |

Explore & Download

Productivity Tools

Sciweavers