Deep Packet Inspection (DPI) refers to examining both packet header and payload to look for predefined patterns, which is essential for network security, intrusion detection and contentaware switch etc. The increasing line speed and expanding pattern sets make DPI a challenging task. Network Processors (NPs) are chosen to perform DPI due to their packet processing performance and programmability. In this paper, we focus on achieving high performance DPI through exploitation of NP's on-chip resources (particularly memory) and inherent parallel processing capability. We study the parallelism in classical DPI algorithms and construct a memory model for different parallel matching methods. Based on the model, we find the optimal organization of state machines that requires minimal on-chip memory space and guides us to high performance NP architectures for DPI. The performance evaluation experiments show that our method can reduce the memory usage by up to 86%. With an Intel IXP28xx N...