Handheld devices generally provide the facility of text input through keys that are an inconvenient and slow way of input. Digitizing tablets and light pens, on the other hand pro...
The processing of Japanese text is complicated by the fact that there are no word delimiters. To segment Japanese text, systems typically use knowledge-based methods and large lex...
Authorship analysis of electronic texts assists digital forensics and anti-terror investigation. Author identification can be seen as a single-label multi-class text categorizatio...
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
We propose a self-supervised word-segmentation technique for Chinese information retrieval. This method combines the advantages of traditional dictionary based approaches with cha...
Fuchun Peng, Xiangji Huang, Dale Schuurmans, Nick ...