There is no blank to mark word boundaries in Chinese text. As a result, identifying words is difficult, because of segmentation ambiguities and occurrences of unknown words. Conve...
This paper describes a system for handwritten Chinese text recognition integrating language model. On a text line image, the system generates character segmentation and word segme...
We propose an unsupervised segmentation method based on an assumption about language data: that the increasing point of entropy of successive characters is the location of a word ...
This paper reports on a study involving the automatic extraction of Chinese legal terms. We used a word segmented corpus of Chinese court judgments to extract salient legal expres...
This paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) t...