In this paper, we discuss kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees in which every node might be la...
Peter Geibel, Olga Pustylnikov, Alexander Mehler, ...
We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
In this paper, we discuss tree kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees, in which every node might...
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transfor...
To support the rapid growth of the web and e-commerce, W3C developed DOM as an application programming interface that the abstract, logical tree structure of an XML document. In t...