A Flexible Structured-Based Representation for XML Document Mining

16 years 2 days ago

Download hal.inria.fr

This paper reports on the INRIA group’s approach to XML mining while participating in the INEX XML Mining track 2005. We use a ﬂexible representation of XML documents that allows taking into account the structure only or both the structure and content. Our approach consists of representing XML documents by a set of their subpaths, deﬁned according to some criteria (length, root beginning, leaf ending). By considering those sub-paths as words, we can use standard methods for vocabulary reduction, and simple clustering methods such as k-means. We use an implementation of the clustering algorithm known as dynamic clouds that can work with distinct groups of independent modalities put in separate variables. This is useful in our model since embedded sub-paths are not independent: we split potentially dependant paths into separate variables, resulting in each of them containing independant paths. Experiments with the INEX collections show good results for the structure-only collection...

Anne-Marie Vercoustre, Mounir Fegas, Saba Gul, Yve

Real-time Traffic

INEX 2005 | INEX XML Mining | Information Management | XML Documents | Xml Mining |

claim paper

Related Content

» XML algebras for data mining

» StructureBased Document Model with Discrete Wavelet Transforms and Its Application to Docu...

» Template guided association rule mining from XML documents

» STEX a system for flexible formalization of linked data

» Conceptual Design of XML Document Warehouses

» Implementing and Optimizing FineGranular Lock Management for XML Document Trees

» Razor mining distanceconstrained embedded subtrees

» Warehousing complex data from the web

» Xproj a framework for projected structural clustering of xml documents

Post Info
More Details (n/a)

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	INEX
Authors	Anne-Marie Vercoustre, Mounir Fegas, Saba Gul, Yves Lechevallier

Comments (0)

Sciweavers

A Flexible Structured-Based Representation for XML Document Mining

INEX 2005 | INEX XML Mining | Information Management | XML Documents | Xml Mining |

Explore & Download

Productivity Tools

Sciweavers