Synthetically generated data has always been important for evaluating and understanding new ideas in database research. In this paper, we describe a data generator for generating synthetic complex-structured XML data that allows for a high level of control over the characteristics of the generated data. This data generator is certainly not the ultimate solution to the problem of generating synthetic XML data, but we have found it very useful in our research on XML data management, and we believe that it can also be useful to other researchers. Furthermore, we hope that this paper starts a discussion in the XML community about characterizing and generating XML data, and that it may serve as a first step towards developing a commonly accepted XML data generator for our community.
Ashraf Aboulnaga, Jeffrey F. Naughton, Chun Zhang