Logical entity recognition in heterogeneous collections of document page images remains a challenging problem since the performance of traditional supervised methods degrade drama...
This paper presents an approach for identifying similar documents that can be used to assist scientists in finding related work. The approach called Citation Proximity Analysis (C...
Jacqueline Leta, Birger Larsen, Ronald Rousseau, W...
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
This paper introduces a method for automatically partitioning richly-formatted electronic documents. An automatic partitioning system has many potential uses, but we focus here on ...
We have developed a well performing algorithm for locating address blocks in postal parcel images. Both machine printed and handwritten addresses are processed by the algorithm. T...