Exact and approximate membership lookups are among the most widely used primitives in a number of network applications. Hash tables are commonly used to implement these primitive functions as they provide O(1) operations at moderate load (table occupancy). However, at high load, collisions become prevalent in the table, which makes lookup highly non-deterministic and reduces the average performance. Slow and non-deterministic lookups are detrimental to the performance and scalability of modern platforms such as ASIC/FPGA and multi-core that use highly parallel compute and memory structures. To combat non-determinism and achieve high rate lookups, a recent series of papers employ compact on-chip memory that augments the main hash table and stores certain key information. Unfortunately, they require substantial on-chip memory space and bandwidth, and fail to provide 100% guarantee on lookup rate. In this paper, we solve this with a novel construction that requires 10-fold smaller on-chi...