Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data

16 years 7 months ago

Download www.dbs.informatik.uni-muenchen.de

The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The similarity join combines two point sets of a multidimensional vector space such that the result contains all point pairs where the distance does not exceed a parameter . In this paper, we propose the Epsilon Grid Order, a new algorithm for determining the similarity join of very large data sets. Our solution is based on a particular sort order of the data points, which is obtained by laying an equi-distant grid with cell length over the data space and comparing the grid cells lexicographically. A typical problem of grid-based approaches such as MSJ or the -kdB-tree is that large portions of the data sets must be held simultaneously in main memory. Therefore, these approaches do not scale to large data sets. Our technique avoids this problem by an external sorting algorithm and a particular scheduling strategy...

Christian Böhm, Bernhard Braunmüller, Fl

Real-time Traffic

Database | Keywords Similarity Join | Large Data Sets | SIGMOD 2001 | Similarity Join |

claim paper

Related Content

» HighDimensional Similarity Joins

» GESS a scalable similarityjoin algorithm for mining large data sets in high dimensional sp...

» A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets

» Web data integration using approximate string join

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2001
Where	SIGMOD
Authors	Christian Böhm, Bernhard Braunmüller, Florian Krebs, Hans-Peter Kriegel

Comments (0)

Sciweavers

Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data

Database | Keywords Similarity Join | Large Data Sets | SIGMOD 2001 | Similarity Join |

Explore & Download

Productivity Tools

Sciweavers