We present a stochastic finite-state model for segmenting Chinese text into dictionary entries and productively derived words, and providing pronunciations for these words; the method incorporates a class-based model in its treatment of personal names. We also evaluate the system's performance, taking into account the fact that people often do not agree on a single segmentation. THE PROBLEM The initial step of any text analysis task is the tokenization of the input into words. For many writing systems, using whitespace as a delimiter for words yields reasonable results. However, for Chinese and other systems where whitespace is not used to delimit words, such trivial schemes will not work. Chinese writing is morphosyllabic (DeFrancis, 1984), meaning that each hanzi- 'Chinese character' - (nearly always) represents a single syllable that is (usually) also a single morpheme. Since in Chinese, as in English, words may be polysyllabic, and since hanzi are written with no in...