Identification of all objects in a dataset whose similarity is not less than a specified threshold is of major importance for management, search, and analysis of data. Set similari...
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has ...
In this paper we present an efficient, scalable and general algorithm for performing set joins on predicates involving various similarity measures like intersect size, Jaccard-coe...
Abstract-- Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Tradi...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Haichuan Sh...
An important database primitive for commonly used feature databases is the similarity join. It combines two datasets based on some similarity predicate into one set such that the n...
Hans-Peter Kriegel, Peter Kunath, Martin Pfeifle, ...