Sciweavers

SIGMOD
1998
ACM

Extracting Schema from Semistructured Data

14 years 4 months ago
Extracting Schema from Semistructured Data
Semistructured data is characterized by the lack of any fixed and rigid schema, although typically the data hassomeimplicitstructure. While thelack offixedschemamakesextracting semistructureddata fairly easy and an attractive goal, presenting and querying such data is greatly impaired. Thus, a critical problem is the discovery of the structure implicit in semistructured data and, subsequently, the recasting of the raw data in terms of this structure. In this paper, we consider a very general form of semistructured data based on labeled, directed graphs. We show that such data can be typed using the greatest fixpoint semantics of monadic datalog programs. We present an algorithm for approximate typing of semistructured data. We establish that the general problem of finding an optimal such typing is NP-hard, but present some heuristics and techniquesbased on clustering that allow efficient and near-optimal treatment of the problem. We also present some preliminary experimental resu...
Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwan
Added 05 Aug 2010
Updated 05 Aug 2010
Type Conference
Year 1998
Where SIGMOD
Authors Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani
Comments (0)