Protocol reverse engineering has often been a manual process that is considered time-consuming, tedious and error-prone. To address this limitation, a number of solutions have recently been proposed to allow for automatic protocol reverse engineering. Unfortunately, they are either limited in extracting protocol fields due to lack of program semantics in network traces or primitive in only revealing the flat structure of protocol format. In this paper, we present a system called AutoFormat that aims at not only extracting protocol fields with high accuracy, but also revealing the inherently “non-flat”, hierarchical structures of protocol messages. AutoFormat is based on the key insight that different protocol fields in the same message are typically handled in different execution contexts (e.g., the runtime call stack). As such, by monitoring the program execution, we can collect the execution context information for every message byte (annotated with its offset in the entire...