To answer user queries, a data integration system employs a set of semantic mappings between the mediated schema and the schemas of data sources. In dynamic environments sources often undergo changes that invalidate the mappings. Hence, once the system is deployed, the administrator must monitor it over time, to detect and repair broken mappings. Today such continuous monitoring is extremely labor intensive, and poses a key bottleneck to the widespread deployment of data integration systems in practice. We describe Maveric, an automatic solution to detecting broken mappings. At the heart of Maveric is a set of computationally inexpensive modules called sensors, which capture salient characteristics of data sources (e.g., value distributions, HTML layout properties). We describe how Maveric trains and deploys the sensors to detect broken mappings. Next we develop three novel improvements: perturbation (i.e., injecting artificial changes into the sources) and multi-source training to i...
Robert McCann, Bedoor K. AlShebli, Quoc Le, Hoa Ng