It is commonly believed that word segmentation accuracy is monotonically related to retrieval performance in Chinese information retrieval. In this paper we show that, for Chinese...
Fuchun Peng, Xiangji Huang, Dale Schuurmans, Nick ...
Out of vocabulary (OOV) words are problematic for cross language information retrieval. One way to deal with OOV words when the two languages have different alphabets, is to trans...
We present our hybrid system for the PAN challenge at CLEF 2010. Our system performs plagiarism detection for translated and non-translated externally as well as intrinsically plag...
Markus Muhr, Roman Kern, Mario Zechner, Michael Gr...
Sentence segmentation and punctuation recovery are critical components for effective spoken language translation (SLT). In this paper we describe our recent work on sentence segme...
Matthias Paulik, Sharath Rao, Ian R. Lane, Stephan...
Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/agei...