The accurate recognition of text that appears in images/videos using analytical character recognition methods is often very difficult, despite the fact that the text might be corr...
We describe an approach to improve Statistical Machine Translation (SMT) performance using multi-lingual, parallel, sentence-aligned corpora in several bridge languages. Our appro...
We propose to analyse semantic similarity in comparable text by matching syntactic trees and labeling the alignments according to one of five semantic similarity relations. We pre...
We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...