In this paper, we present the design and implementation of Haetae, a high-performance Suricata-based NIDS on many-core processors (MCPs). Haetae achieves high performance with three design choices. First, Haetae extensively exploits high parallelism by launching NIDS engines that independently analyze the incoming flows at high speed as much as possible. Second, Haetae fully leverages programmable network interface cards to offload common packet processing tasks from regular cores. Also, Haetae minimizes redundant memory access by maintaining the packet metadata structure as small as possible. Third, Haetae dynamically offloads flows to the host-side CPU when the system experiences a high load. This dynamic flow offloading utilizes all processing power on a given system regardless of processor types. Our evaluation shows that Haetae achieves up to 79.3 Gbps for synthetic traffic or 48.5 Gbps for real packet traces. Our system outperforms the best-known GPU-based NIDS by 2.4 times an...