Plagiarism, the unacknowledged reuse of text, does not end at language boundaries. Cross-language plagiarism occurs if a text is translated from a fragment written in a different ...
While strings and syntax trees are used by the Natural Language Processing community to represent the structure of spoken languages, these encodings are difficult to adapt to a si...
This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this pro...
Large amounts of low- to medium-quality English texts are now being produced by machine translation (MT) systems, optical character readers (OCR), and non-native speakers of Engli...
This paper revisits the lattice-based thesaurus models which Margaret Masterman used for machine translation in the 1950’s and 60’s. Masterman’s notions are mapped onto moder...