This paper aims at presenting the application of first-order logic machine learning techniques to two document domains in order to learn rules for recognizing the semantic role of their logical components. Specifically, the multistrategy incremental learning system INTHELEX has been applied to multi-format scientific papers and documents concerning European films from the 20’s and 30’s. The challenge comes from the different levels of formatting standards in these domains: from (more or less) standardized layouts, in scientific papers, to documents with almost no standard, in historical cultural heritage material. Experimental results in both domains and a comparison with the Progol system assess the advantages that the exploitation of INTHELEX can yield.