Sciweavers

WWW
2007
ACM

GigaHash: scalable minimal perfect hashing for billions of urls

15 years 6 days ago
GigaHash: scalable minimal perfect hashing for billions of urls
A minimal perfect function maps a static set of keys on to the range of integers {0,1,2, ... , - 1}. We present a scalable high performance algorithm based on random graphs for constructing minimal perfect hash functions (MPHFs). For a set of keys, our algorithm outputs a description of in expected time (). The evaluation of () requires three memory accesses for any key and the description of takes up 0.89 bytes (7.13 bits). This is the best (most space efficient) known result to date. Using a simple heuristic and Huffman coding, the space requirement is further reduced to 0.79 bytes (6.86 bits). We present a high performance architecture that is easy to parallelize and scales well to very large data sets encountered in internet search applications. Experimental results on a one billion URL dataset obtained from Live Search crawl data, show that the proposed algorithm (a) finds an MPHF for one billion URLs in less than 4 minutes, and (b) requires only 6.86 bits/key for the descri...
Kumar Chellapilla, Anton Mityagin, Denis Xavier Ch
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Kumar Chellapilla, Anton Mityagin, Denis Xavier Charles
Comments (0)