As an important technique for data analysis, clustering has been employed in many applications such as image segmentation, document clustering and vector quantization. Divisive cl...
Abstract. Semi-supervised clustering models, that incorporate user provided constraints to yield meaningful clusters, have recently become a popular area of research. In this paper...
Logical entity recognition in heterogeneous collections of document page images remains a challenging problem since the performance of traditional supervised methods degrade drama...
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
This paper introduces a method for automatically partitioning richly-formatted electronic documents. An automatic partitioning system has many potential uses, but we focus here on ...