Large-scale information integration, and in particular, search on the World Wide Web, is pushing the limits on the combination of structured data and unstructured data. By its very nature, as we combine a large number of information sources, our ability to model the domain in a completely structured way diminishes. We argue that in order to build applications that combine structured and unstructured data, there is a need for a new modeling tool. We consider the question of modeling an application domain whose data may be partially structured and partially unstructured. In particular, we are concerned with applications where the border between the structured and unstructured parts of the data is not well defined, not well known in advance, or may evolve over time. We propose the concept of malleable schemas as a modeling tool that enables incorporating both structured and unstructured data from the very beginning, and evolving one’s model as it becomes more structured. A malleable s...
Xin Dong, Alon Y. Halevy