b-Bit minwise hashing

16 years 1 months ago

Download research.microsoft.com

This paper establishes the theoretical framework of b-bit minwise hashing. The original minwise hashing method has become a standard technique for estimating set similarity (e.g., resemblance) with applications in information retrieval, data management, computational advertising, etc. By only storing b bits of each hashed value (e.g., b = 1 or 2), we gain substantial advantages in terms of storage space. We prove the basic theoretical results and provide an unbiased estimator of the resemblance for any b. We demonstrate that, even in the least favorable scenario, using b = 1

Ping Li, Arnd Christian König

Real-time Traffic

Internet Technology | Minwise Hashing Method | Set Similarity | Theoretical Framework | WWW 2010 |

claim paper

» On the kIndependence Required by Linear Probing and Minwise Independence

» Fingerprinting Ratings for Collaborative Filtering Theoretical and Empirical Analysis

» Approximate String Search in Spatial Databases

Post Info
More Details (n/a)

Added	13 May 2010
Updated	13 May 2010
Type	Conference
Year	2010
Where	WWW
Authors	Ping Li, Arnd Christian König

Comments (0)

Sciweavers

b-Bit minwise hashing

Internet Technology | Minwise Hashing Method | Set Similarity | Theoretical Framework | WWW 2010 |

Explore & Download

Productivity Tools

Sciweavers