It is becoming increasingly common to construct databases from information automatically culled from many heterogeneous sources. For example, a research publication database can b...
Aron Culotta, Michael L. Wick, Robert Hall, Matthe...
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical ap...
1 We consider the problem of similarity search in applications where the cost of computing the similarity between two records is very expensive, and the similarity measure is not a...
Chris Jermaine, Fei Xu, Mingxi Wu, Ravi Jampani, T...
It has become a promising direction to measure similarity of Web search queries by mining the increasing amount of clickthrough data logged by Web search engines, which record the...
Qiankun Zhao, Steven C. H. Hoi, Tie-Yan Liu, Soura...