Sciweavers

31 search results - page 4 / 7
» Enhancing Chinese Word Segmentation Using Unlabeled Data
Sort
View
TALIP
2002
108views more  TALIP 2002»
13 years 7 months ago
Toward a unified approach to statistical language modeling for Chinese
This paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) t...
Jianfeng Gao, Joshua Goodman, Mingjing Li, Kai-Fu ...
ICPR
2002
IEEE
14 years 8 months ago
Incorporating Conditional Independence Assumption with Support Vector Machines to Enhance Handwritten Character Segmentation Per
Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to charac...
Manolis Maragoudakis, Ergina Kavallieratou, Nikos ...
LREC
2010
188views Education» more  LREC 2010»
13 years 8 months ago
How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method
We investigate the impact of input data scale in corpus-based learning using a study style of Zipf's law. In our research, Chinese word segmentation is chosen as the study ca...
Hai Zhao, Yan Song, Chunyu Kit
KDD
2009
ACM
211views Data Mining» more  KDD 2009»
14 years 8 months ago
Address standardization with latent semantic association
Address standardization is a very challenging task in data cleansing. To provide better customer relationship management and business intelligence for customer-oriented cooperates...
Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang,...
ACL
2009
13 years 5 months ago
Mining Bilingual Data from the Web with Adaptively Learnt Patterns
Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...