Sciweavers

EMNLP
2010
13 years 8 months ago
Enhancing Domain Portability of Chinese Segmentation Model Using Chi-Square Statistics and Bootstrapping
Almost all Chinese language processing tasks involve word segmentation of the language input as their first steps, thus robust and reliable segmentation techniques are always requ...
Baobao Chang, Dongxu Han
CORR
2002
Springer
90views Education» more  CORR 2002»
13 years 10 months ago
Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences
Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation a...
Rie Kubota Ando, Lillian Lee
COLING
2002
13 years 10 months ago
Investigating the Relationship between Word Segmentation Performance and Retrieval Performance in Chinese IR
It is commonly believed that word segmentation accuracy is monotonically related to retrieval performance in Chinese information retrieval. In this paper we show that, for Chinese...
Fuchun Peng, Xiangji Huang, Dale Schuurmans, Nick ...
COLING
1996
14 years 5 days ago
The Automatic Extraction of Open Compounds from Text Corpora
This paper describes a new method for extracting open compounds (uninterrupted sequences of words) from text corpora of languages, such as Thai, Japanese and Korea that exhibit un...
Virach Sornlertlamvanich, Hozumi Tanaka
COLING
1994
14 years 5 days ago
An IBM-PC Environment For Chinese Corpus Analysis
This paper describes a set of computer programs for Chinese corpus analysis. These programs include (1) extraction of different characters, bigrams and words; (2) word segmentatio...
Robert Wing Pong Luk
ACL
1997
14 years 5 days ago
A Trainable Rule-based Algorithm for Word Segmentation
This paper presents a trainable rule-based algorithm for performing word segmentation. The algorithm provides a simple, language-independent alternative to large-scale lexicai-bas...
David D. Palmer
COLING
2000
14 years 6 days ago
Automatic Corpus-Based Thai Word Extraction with the C4.5 Learning Algorithm
"Word" is difficult to define in the languages that do not exhibit explicit word boundary, such as Thai. Traditional methods on defining words for this kind of languages...
Virach Sornlertlamvanich, Tanapong Potipiti, Thats...
EMNLP
2004
14 years 7 days ago
Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?
Chinese part-of-speech (POS) tagging assigns one POS tag to each word in a Chinese sentence. However, since words are not demarcated in a Chinese sentence, Chinese POS tagging req...
Hwee Tou Ng, Jin Kiat Low
ACL
2006
14 years 7 days ago
Contextual Dependencies in Unsupervised Word Segmentation
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment...
Sharon Goldwater, Thomas L. Griffiths, Mark Johnso...
ACL
2004
14 years 7 days ago
Adaptive Chinese Word Segmentation
This paper presents a Chinese word segmentation system which can adapt to different domains and standards. We first present a statistical framework where domain-specific words are...
Jianfeng Gao, Andi Wu, Cheng-Ning Huang, Hongqiao ...