It has frequently been observed that most of the world’s data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others. The world of unstructured data has several very appealing properties, such as ease of authoring, querying and data sharing. In contrast, authoring, querying and sharing structured data require significant effort, albeit with the benefit of rich query languages and exact answers. We argue that in order to broaden the use of data management tools, we need a concerted effort to cross this structure chasm, by importing the attractive properties of the unstructured world into the structured one. As an initial effort in this direction, we introduce the REVERE System, which offers several mechanisms for crossing the structure chasm, and considers as its first application the chasm on the WWW. REVERE includes three innovations: (1) a data creation environment that entices people to structure da...
Alon Y. Halevy, Oren Etzioni, AnHai Doan, Zachary