Jurilinguistic Engineering in Cantonese Chinese: An N-gram-based Speech to Text Transcription System

15 years 8 months ago

Download acl.ldc.upenn.edu

A Cantonese Chinese transcription system to automatically convert stenograph code to Chinese characters ix reported. The major challenge in developing such a system is the critical homocode problem because of homonymy. The statistical N-gram model is used to compute the best combination of characters. Supplemented with a 0.85 million character corpus of donmin-specific training data and enhancement measures, the bigram and trigrmn implementations achieve 95% and 96% accuracy respectively, as compared with 78% accuracy in the baseline model. The system perforlnance is comparable with other adwmced Chinese Speech-to-Text input applications under development. The system meets an urgent need o1' the .ludiciary ot: post1997 Hong Kong. Keyword: Speech to Text, Statistical Modelling, Cantonese, Chinese, Language Engineering

Benjamin K. Tsou, K. K. Sin, Samuel W. K. Chan, To

Real-time Traffic

Cantonese Chinese Transcription | Chinese Characters | COLING 2000 | COLING 2008 | Statistical N-gram Model |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	2000
Where	COLING
Authors	Benjamin K. Tsou, K. K. Sin, Samuel W. K. Chan, Tom B. Y. Lai, Caesar Suen Lun, K. T. Ko, Gary K. K. Chan, Lawrence Y. L. Cheung

Comments (0)

Sciweavers

Jurilinguistic Engineering in Cantonese Chinese: An N-gram-based Speech to Text Transcription System

Cantonese Chinese Transcription | Chinese Characters | COLING 2000 | COLING 2008 | Statistical N-gram Model |

Explore & Download

Productivity Tools

Sciweavers