As network traffic continues to increase and with the requirement to process packets at line rates, high performance routers need to forward millions of packets every second. Even with an efficient lookup algorithm like the LC-trie, each packet needs upto 5 memory accesses. Earlier work shows that a single cache for the nodes of an LC-trie can reduce the number of external memory accesses. We observe that the locality characteristics of the level-one nodes of an LC-trie are significantly different from those of lower-level nodes. Hence, we propose a heterogeneously segmented cache architecture (HSCA) which uses separate caches for level-one and lower-level nodes each with carefully chosen sizes. We further improve the hit rate of the level-one nodes cache by introducing a weight-based replacement policy and an intelligent index bit selection scheme. To evaluate our cache scheme with realistic traces, we propose a synthetic trace generation method which emulates real traces and can ...