The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such that the r...
Abstract-- Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Tradi...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Haichuan Sh...
Current data repositories include a variety of data types, including audio, images and time series. State of the art techniques for indexing such data and doing query processing r...
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has ...
In this paper we present an efficient, scalable and general algorithm for performing set joins on predicates involving various similarity measures like intersect size, Jaccard-coe...