In this paper, we describe a reversibleletter-to-sound/soundto-letter generation system based on an approach which combines a rule-based formalism with data-driven techniques. We adopt a probabilisticparsing strategy to provide a hierarchical lexical analysis of a word, including information such as morphology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and testing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6% and 5% respectively. Of the remaining
Helen M. Meng, Stephanie Seneff, Victor Zue