In this work, we present a highly scalable network intrusion detection system on many-core processors. To maximize the NIDS performance, we take advantage of the underlying hardware and adhere to four design principles: shared-nothing architecture, computation offloading, lightweight data structure, and flow offloading. Through the experimental results, we find that our design choices can significantly improve the NIDS performance (79 Gbps with 1514B synthetic packets). We believe that our design decisions can be easily extended to other many-core processors and programmable NICs. Categories and Subject Descriptors C.2.1 [Computer-Communication Networks]: Network Architecture and Design; C.2.3 [Computer-Communication Networks]: Network Operations—Network monitoring General Terms Performance Keywords many-core, network intrusion detection system, parallel, offloading