An organization makes a new release as new information become available, releases a tailored view for each data request, releases sensitive information and identifying information...
We show that aggregate constraints (as opposed to pairwise constraints) that often arise when integrating multiple sources of data, can be leveraged to enhance the quality of dedu...
Surajit Chaudhuri, Anish Das Sarma, Venkatesh Gant...
A similarity join correlating fragments in XML documents, which are similar in structure and content, can be used as the core algorithm to support data cleaning and data integratio...
: Simplifying data programming is a core mission of data management research. The issue at stake is to help engineers build efficient and robust data-centric applications. The fron...
The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such...