Abstract— Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic...
Fabian Panse, Maurice van Keulen, Ander de Keijzer...
— One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input da...
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas...
—In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To thi...
Odysseas Papapetrou, Sukriti Ramesh, Stefan Siersd...
The WHO Collaborating Centre for International Drug Monitoring in Uppsala, Sweden, maintains and analyses the world's largest database of reports on suspected adverse drug re...
Abstract. Efficiently detecting near duplicate resources is an important task when integrating information from various sources and applications. Once detected, near duplicate reso...