Letter to Sound Rules for Accented Lexicon Compression

14 years 2 months ago

Download www.cs.cmu.edu

This paper presents trainable methods for generating letter to sound rules from a given lexicon for use in pronouncing out-ofvocabulary words and as a method for lexicon compression. As the relationship between a string of letters and a string of phonemes representing its pronunciation for many languages is not trivial, we discuss two alignment procedures, one fully automatic and one hand seeded which produce reasonable alignments of letters to phones (or epsilon). Top Down Induction Tree models are trained on the aligned entries. We show how combined phoneme/stress prediction is better than separate prediction processes, and still better when including in the model the last phonemes transcribed and part of speech information. For the lexicons we have tested, our models have a word accuracy (including stress) of 78% for OALD, 62% for CMU and 94% for BRULEX, allowing substantial reduction in the size of these lexicons.

V. Pagel, Kevin Lenzo, Alan W. Black

Real-time Traffic

CORR 1998 | Education | Induction Tree Models | Lexicon Compression | Trainable Methods |

claim paper

Post Info
More Details (n/a)

Added	22 Dec 2010
Updated	22 Dec 2010
Type	Journal
Year	1998
Where	CORR
Authors	V. Pagel, Kevin Lenzo, Alan W. Black

Comments (0)

Sciweavers

Letter to Sound Rules for Accented Lexicon Compression

CORR 1998 | Education | Induction Tree Models | Lexicon Compression | Trainable Methods |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers