Documentimageunderstandingdenotesthe recognition of semanticallyrelevant componentsin the layout extracted froma documentimage.This recognitionprocessis based on somevisual models,whosemanualspecification can be a highlydemandingtask. In order to automaticallyacquire these models, wepropose the application of machine learning techniques. In this paper, problemsraised by possible dependenciesbetweenconceptsto be learned are illustrated andsolvedwitha computationalstrategy based onthe separate-and-parallel-conquersearch. Theapproach is tested ona set of real multi-pagedocumentsprocessedby the systemWISDOM++.Newresults confirmthe validity of the proposedstrategy and showsomelimits of the machinelearningsystemusedin this work.
Floriana Esposito, Donato Malerba, Francesca A. Li