Sciweavers

144 search results - page 15 / 29
» Improved Source-Channel Models for Chinese Word Segmentation
Sort
View
ACL
2008
13 years 9 months ago
Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data
This paper provides evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing (NLP) tasks, such as part-o...
Jun Suzuki, Hideki Isozaki
ICPR
2010
IEEE
13 years 5 months ago
Improved Mandarin Keyword Spotting Using Confusion Garbage Model
This paper presents an improved acoustic keyword spotting (KWS) algorithm using a novel confusion garbage model in Mandarin conversational speech. Observing the KWS corpus, we foun...
Shilei Zhang, Zhiwei Shuang, Qin Shi, Yong Qin
EMNLP
2007
13 years 9 months ago
Mandarin Part-of-Speech Tagging and Discriminative Reranking
We present in this paper methods to improve HMM-based part-of-speech (POS) tagging of Mandarin. We model the emission probability of an unknown word using all the characters in th...
Zhongqiang Huang, Mary P. Harper, Wen Wang
NAACL
2003
13 years 9 months ago
A Context-Sensitive Homograph Disambiguation in Thai Text-to-Speech Synthesis
Homograph ambiguity is an original issue in Text-to-Speech (TTS). To disambiguate homograph, several efficient approaches have been proposed such as part-of-speech (POS) n-gram, B...
Virongrong Tesprasit, Paisarn Charoenpornsawat, Vi...
NLPRS
2001
Springer
14 years 9 days ago
A Hierarchical EM Approach to Word Segmentation
We propose a simple two-level hierarchical probability model for unsupervised word segmentation. By treating words as strings composed of morphemes/phonemes which are themselves c...
Fuchun Peng, Dale Schuurmans