An Efficient Similarity Join Algorithm with Cosine Similarity Predicate

14 years 2 months ago

Download ids.snu.ac.kr

Given a large collection of objects, finding all pairs of similar objects, namely similarity join, is widely used to solve various problems in many application domains.Computation time of similarity join is critical issue, since similarity join requires computing similarity values for all possible pairs of objects. Several existing algorithms adopt prefix filtering to avoid unnecessary similarity computation; however, existing algorithms implementing the prefix filtering have inefficiency in filtering out object pairs, in particular, when aggregate weighted similarity function, such as cosine similarity, is used to quantify similarity values between objects. This is mostly caused by large prefixes the algorithms select. In this paper, we propose an alternative method to select small prefixes by exploiting the relationship between arithmetic mean and geometric mean of elements' weights. A new algorithm, MMJoin, implementing the proposed methods dramatically reduces the average size...

Dongjoo Lee, Jaehui Park, Junho Shim, Sang-goo Lee

Real-time Traffic

Algorithm | Database | DEXA 2010 | Similarity Join | Similarity Values |

claim paper

Post Info
More Details (n/a)

Added	06 Dec 2010
Updated	06 Dec 2010
Type	Conference
Year	2010
Where	DEXA
Authors	Dongjoo Lee, Jaehui Park, Junho Shim, Sang-goo Lee

Comments (0)

Sciweavers

An Efficient Similarity Join Algorithm with Cosine Similarity Predicate

Algorithm | Database | DEXA 2010 | Similarity Join | Similarity Values |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers