Sciweavers

10 search results - page 1 / 2
» Language Identification of Short Text Segments with N-gram M...
Sort
View
LREC
2010
169views Education» more  LREC 2010»
13 years 5 months ago
Language Identification of Short Text Segments with N-gram Models
There are many accurate methods for language identification of long text samples, but identification of very short strings still presents a challenge. This paper studies a languag...
Tommi Vatanen, Jaakko J. Väyrynen, Sami Virpi...
IPM
2008
196views more  IPM 2008»
13 years 10 months ago
Author identification: Using text sampling to handle the class imbalance problem
Authorship analysis of electronic texts assists digital forensics and anti-terror investigation. Author identification can be seen as a single-label multi-class text categorizatio...
Efstathios Stamatatos
CICLING
2009
Springer
14 years 2 months ago
Language Identification on the Web: Extending the Dictionary Method
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Radim Rehurek, Milan Kolkus
ISMIS
2005
Springer
14 years 4 months ago
A Machine Text-Inspired Machine Learning Approach for Identification of Transmembrane Helix Boundaries
In this paper, we adapt a statistical learning approach, inspired by automated topic segmentation techniques in speech-recognized documents to the challenging protein segmentation ...
Betty Yee Man Cheng, Jaime G. Carbonell, Judith Kl...
LREC
2010
195views Education» more  LREC 2010»
14 years 9 days ago
Adapting Chinese Word Segmentation for Machine Translation Based on Short Units
In Chinese texts, words composed of single or multiple characters are not separated by spaces, unlike most western languages. Therefore Chinese word segmentation is considered an ...
Yiou Wang, Kiyotaka Uchimoto, Jun'ichi Kazama, Can...