Classification of Documents Based on the Structure of Their DOM Trees

15 years 8 months ago

Download www.peter-geibel.de

In this paper, we discuss kernels that can be applied for the classiﬁcation of XML documents based on their DOM trees. DOM trees are ordered trees in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe ﬁve new kernels suitable for such structures: a kernel based on predeﬁned structural features, a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel. We evaluate the kernels experimentally on a corpus containing the DOM trees of newspaper articles and on the well-known SUSANNE corpus.

Peter Geibel, Olga Pustylnikov, Alexander Mehler,

Real-time Traffic

Dom Trees | ICONIP 2007 | Information Technology | Set Tree Kernel | Tree Kernel |

claim paper

» Variants of Tree Kernels for XML Documents

» Document Transformation System from Papers to XML Data Based on Pivot XML Document Method

» Setatatime access to XML through DOM

» A DOM Tree Alignment Model for Mining Parallel Data from the Web

» DoDOM Leveraging DOM Invariants for Web 20 Application Robustness Testing

» A Data Parallel Algorithm for XML DOM Parsing

» A document object modeling method to retrieve data from a very large XML document

» XRules an effective structural classifier for XML data

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ICONIP
Authors	Peter Geibel, Olga Pustylnikov, Alexander Mehler, Helmar Gust, Kai-Uwe Kühnberger

Comments (0)

Sciweavers

Classification of Documents Based on the Structure of Their DOM Trees

Dom Trees | ICONIP 2007 | Information Technology | Set Tree Kernel | Tree Kernel |

Explore & Download

Productivity Tools

Sciweavers