LSH forest: self-tuning indexes for similarity search

15 years 15 days ago

Download www2005.org

We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of settings: Web search engines desire fast, parallel, main-memory-based indexes for similarity search on text data; database systems desire disk-based similarity indexes for high-dimensional data, including text and images; peer-to-peer systems desire distributed similarity indexes with low communication cost. We propose an indexing scheme called LSH Forest which is applicable in all the above contexts. Our index uses the well-known technique of locality-sensitive hashing (LSH), but improves upon previous designs by (a) eliminating the different data-dependent parameters for which LSH must be constantly hand-tuned, and (b) improving on LSH's performance guarantees for skewed data distributions while retaining the same storage and query overhead. We show how to construct this index in main memory, on disk, in p...

Mayank Bawa, Tyson Condie, Prasanna Ganesan

Real-time Traffic

Disk-based Similarity Indexes | Internet Technology | Keywords Similarity Indexes | Similarity Indexes | WWW 2005 |

claim paper

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2005
Where	WWW
Authors	Mayank Bawa, Tyson Condie, Prasanna Ganesan

Comments (0)

Sciweavers

LSH forest: self-tuning indexes for similarity search

Disk-based Similarity Indexes | Internet Technology | Keywords Similarity Indexes | Similarity Indexes | WWW 2005 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers