Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generali...
— Similarity joins have attracted significant interest, with applications in Geographical Information Systems, astronomy, marketing analyzes, and anomaly detection. However, all...
Brent Bryan, Frederick Eberhardt, Christos Falouts...
—Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms f...
Foto N. Afrati, Anish Das Sarma, David Menestrina,...
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
In this paper we present an efficient, scalable and general algorithm for performing set joins on predicates involving various similarity measures like intersect size, Jaccard-coe...