Sciweavers

COLING
2008

Modeling Chinese Documents with Topical Word-Character Models

14 years 2 months ago
Modeling Chinese Documents with Topical Word-Character Models
As Chinese text is written without word boundaries, effectively recognizing Chinese words is like recognizing collocations in English, substituting characters for words and words for collocations. However, existing topical models that involve collocations have a common limitation. Instead of directly assigning a topic to a collocation, they take the topic of a word within the collocation as the topic of the whole collocation. This is unsatisfactory for topical modeling of Chinese documents. Thus, we propose a topical word-character model (TWC), which allows two distinct types of topics: word topic and character topic. We evaluated TWC both qualitatively and quantitatively to show that it is a powerful and a promising topic model.
Wei Hu, Nobuyuki Shimizu, Hiroshi Nakagawa, Huanye
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where COLING
Authors Wei Hu, Nobuyuki Shimizu, Hiroshi Nakagawa, Huanye Sheng
Comments (0)