Single versus Multiple Sorting in All Pairs Similarity Search

15 years 2 months ago

Download www.cbrc.jp

To save memory and improve speed, vectorial data such as images and signals are often represented as strings of discrete symbols (i.e., sketches). Chariker (2002) proposed a fast approximate method for finding neighbor pairs of strings by sorting and scanning with a small window. This method, which we shall call "single sorting", is applied to locality sensitive codes and prevalently used in speed-demanding web-related applications. To improve on single sorting, we propose a novel method that employs blockwise masked sorting. Our method can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations. So it is especially attractive for high dimensional dense data, where distance calculation is expensive. Empirical results show the efficiency of our method in comparison to single sorting and recent fast nearest neighbor methods.

Yasuo Tabei, Takeaki Uno, Masashi Sugiyama, Koji T

Real-time Traffic

Distance Calculation | Fast Approximate Method | JMLR 2010 | Sorting |

claim paper

» CoPub Mapper mining MEDLINE based on search term copublication

» COMPASS server for remote homology inference

» CLEF 2005 Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Post Info
More Details (n/a)

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	JMLR
Authors	Yasuo Tabei, Takeaki Uno, Masashi Sugiyama, Koji Tsuda

Comments (0)

Sciweavers

Single versus Multiple Sorting in All Pairs Similarity Search

Distance Calculation | Fast Approximate Method | JMLR 2010 | Sorting |

Explore & Download

Productivity Tools

Sciweavers