Sciweavers

ICASSP
2011
IEEE

Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process model

13 years 4 months ago
Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process model
Korean is an agglutinative language that does not have explicit word boundaries. It is also a highly inflective language that exhibits severe coarticulation effects. These characteristics pose a challenge in developing large-vocabulary continuous speech recognition (LVCSR) systems. Many existing Korean LVCSR systems attempt to overcome these difficulties by defining a set of “word” units using morphological analysis (rule-based) or statistical methods. These approaches usually require a great deal of linguistic knowledge or at least some explicit information about the statistical distribution of the units. However, exceptions or uncommon words (e.g., foreign proper nouns) still exist that cannot be covered by rules alone. In this paper, we investigate the use of an unsupervised, nonparametric Bayesian approach to automatically determining efficient units for a Korean LVCSR system. Specifically, we utilize a Dirichlet process model trained using Bayesian inference through bloc...
Sakriani Sakti, Andrew M. Finch, Ryosuke Isotani,
Added 20 Aug 2011
Updated 20 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Sakriani Sakti, Andrew M. Finch, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura
Comments (0)