We study the design issues of data-centric XML documents where (1) there are no mixed contents, i.e., each element may have some subelements and attributes, or it may have a single value in the form of a character string, but not a mixture of strings and subelements and/or attributes, (2) the ordering of subelements is of no significance. We provide a new definition of functional dependency (FD) for XML that generalizes those published previously. We also define equality-generating dependencies (EGDs) for XML, which, to our knowledge, have not been studied before. We show how to use EGDs and FDs to detect data redundancies in XML, and propose normal forms of DTDs with respect to these constraints. We show that our normal forms are necessary and sufficient to ensure all conforming XML documents have no redundancies. In passing, we define a normal form for relational databases based on EGDs in relational systems that can help remove data redundancies across multiple relations.
Junhu Wang, Rodney W. Topor