—For historical documents, available transcriptions typically are inaccurate when compared with the scanned document images. Not only the position of the words and sentences are ...
We report an automatic feature discovery method that achieves results comparable to a manually chosen, larger feature set on a document image content extraction problem: the locat...
With the advent of XML we have seen a renewed interest in methods for computing the difference between trees. Methods that include heuristic elements play an important role in pr...
Tancred Lindholm, Jaakko Kangasharju, Sasu Tarkoma
This paper presents algorithms for efficiently computing the covariance matrix for features that form sub-windows in a large multidimensional image. For example, several image proc...
Script identification is required for a multilingual OCR system. In this paper, we present a novel and efficient technique for Bangla/English script identification with application...