Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons

15 years 8 months ago

Download www.gavo.t.u-tokyo.ac.jp

Phoneme segmentation is a fundamental problem in many speech recognition and synthesis studies. Unsupervised phoneme segmentation assumes no knowledge on linguistic contents and acoustic models, and thus poses a challenging problem. The essential question here is what is the optimal segmentation. This paper formulates the optimal segmentation problem into a probabilistic framework. Using statistics and information theory analysis, we develop three different objective functions, namely, Summation of Square Error (SSE), Log Determinant (LD) and Rate Distortion (RD). Specially, RD function is derived from information rate distortion theory and can be related to human signal perception mechanism. We introduce a time-constrained agglomerative clustering algorithm to ﬁnd the optimal segmentations. We also propose an efﬁcient method to implement the algorithm by using integration functions. We carry out experiments on TIMIT database to compare the above three objective functions. The res...

Yu Qiao, Naoya Shimomura, Nobuaki Minematsu

Real-time Traffic