We propose an algorithm that enables joint Viterbi decoding of multiple independent audio recordings of a word to derive its pronunciation. Experiments show that this method results in better pronunciation estimation and word recognition accuracy than that obtained either with a single example of the word or using conventional approaches to pronunciation estimation using multiple examples ICASSP 2009 This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with...