Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process model

14 years 9 months ago

Download mirlab.org

Korean is an agglutinative language that does not have explicit word boundaries. It is also a highly inﬂective language that exhibits severe coarticulation effects. These characteristics pose a challenge in developing large-vocabulary continuous speech recognition (LVCSR) systems. Many existing Korean LVCSR systems attempt to overcome these difﬁculties by deﬁning a set of “word” units using morphological analysis (rule-based) or statistical methods. These approaches usually require a great deal of linguistic knowledge or at least some explicit information about the statistical distribution of the units. However, exceptions or uncommon words (e.g., foreign proper nouns) still exist that cannot be covered by rules alone. In this paper, we investigate the use of an unsupervised, nonparametric Bayesian approach to automatically determining efﬁcient units for a Korean LVCSR system. Speciﬁcally, we utilize a Dirichlet process model trained using Bayesian inference through bloc...

Sakriani Sakti, Andrew M. Finch, Ryosuke Isotani,

Real-time Traffic

ICASSP 2011 | Linguistic Knowledge | Morphological Analysis | Proper Nouns | Signal Processing |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Sakriani Sakti, Andrew M. Finch, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura

Comments (0)

Sciweavers

Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process model

ICASSP 2011 | Linguistic Knowledge | Morphological Analysis | Proper Nouns | Signal Processing |

Explore & Download

Productivity Tools

Sciweavers