Sciweavers

ICDM
2010
IEEE

Improved Consistent Sampling, Weighted Minhash and L1 Sketching

13 years 9 months ago
Improved Consistent Sampling, Weighted Minhash and L1 Sketching
Abstract--We propose a new Consistent Weighted Sampling method, where the probability of drawing identical samples for a pair of inputs is equal to their Jaccard similarity. Our method takes deterministic constant time per non-zero weight, improving on the best previous approach which takes expected constant time. The samples can be used as Weighted Minhash for efficient retrieval and compression (sketching) under Jaccard or L1 metric. A method is presented for using simple data statistics to reduce the running time of hash computation by two orders of magnitude. We compare our method with the random projection method and show that it
Sergey Ioffe
Added 12 Feb 2011
Updated 12 Feb 2011
Type Journal
Year 2010
Where ICDM
Authors Sergey Ioffe
Comments (0)