Modern deep packet inspection systems use regular expressions to define various patterns of interest in network data streams. Deterministic Finite Automata (DFA) are commonly used to parse regular expressions. DFAs are fast, but can require prohibitively large amounts of memory for patterns arising in network applications. Traditional DFA table compression only slightly reduces the memory required and requires an additional memory access per input character. Alternative representations of regular expressions, such as NFAs and Delayed Input DFAs (D2 FA) require less memory but sacrifice throughput. In this paper we introduce the Content Addressed Delayed Input DFA (CD2 FA), which provides a compact representation of regular expressions that match the throughput of traditional uncompressed DFAs. A CD2 FA addresses successive states of a D2 FA using their content, rather than a “content-less” identifier. This makes selected information available earlier in the state traversal process...
Sailesh Kumar, Jonathan S. Turner, John Williams