(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical patt...
In this paper, a machine learning approach to support the user during the correction of the layout analysis is proposed. Layout analysis is the process of extracting a hierarchica...
Performance evaluation for document image analysis and understanding is a recurring problem. Many groundtruthed document image databases are now used to evaluate general algorithm...
A major obstacle to fully integrated deployment of many data mining algorithms is the assumption that data sits in a single table, even though most real-world databases have compl...
Alexandrin Popescul, Lyle H. Ungar, Steve Lawrence...
Despite ubiquitous claims that optical character recognition (OCR) is a "solved problem," many categories of documents continue to break modern OCR software such as docu...