This paper presents a staged series of artificial neural networks (ANNs) for phoneme recognition for text-to-speech applications. Contrary from much of the prior published literature this approach is not restricted to monosyllabic words or the pronunciation of single multi-syllabic words, but can readily be embodied in a program that allows for the reading of a complete text. Also, it does not require pre-processing to align the letters and phonemes on the training data. The training data utilized are the 2000 most common words in American English. As an illustration it is shown that the staged neural neural network approach works excellent for a sample text (in this case the first paragraph of Frank Baum’s “The Wonderful Wizard of Oz”).
Fabio A. Arciniegas, Mark J. Embrechts