A hardware-assisted design, dubbed cache-oriented multistage structure (COMS), is proposed for fast packet forwarding. COMS incorporates small on-chip cache memory in its constituent switching elements (SE’s) for a parallel router to interconnect its line cards (LC’s) and forwarding engines (FE’s, where table lookups are performed). Each lookup result in COMS is cached in a series of SE’s between the FE (which performs the lookup) and the LC (where the lookup request originates). The cached lookup results fulfill subsequent lookup requests for identical addresses immediately without resorting to FE’s for (time-consuming) lookups, thus reducing the mean lookup time tremendously. COMS calls for partitioning the set of prefixes in a routing table into subsets (of roughly equal sizes) so that each subset involves only a small fraction of the table for one FE. This leads to a substantial savings of SRAM required in each FE to hold its forwarding table, and the total savings of SR...