Integrating Joint n-gram Features into a Discriminative Training Framework

15 years 4 months ago

Download aclweb.org

Phonetic string transduction problems, such as letter-to-phoneme conversion and name transliteration, have recently received much attention in the NLP community. In the past few years, two methods have come to dominate as solutions to supervised string transduction: generative joint n-gram models, and discriminative sequence models. Both approaches benefit from their ability to consider large, flexible spans of source context when making transduction decisions. However, they encode this context in different ways, providing their respective models with different information. To combine the strengths of these two systems, we include joint n-gram features inside a state-of-the-art discriminative sequence model. We evaluate our approach on several letter-to-phoneme and transliteration data sets. Our results indicate an improvement in overall performance with respect to both the joint n-gram approach and traditional feature sets for discriminative models.

Sittichai Jiampojamarn, Colin Cherry, Grzegorz Kon

Real-time Traffic