This paper concerns the fundamental problem of identifying the content nature of a flow, namely text, binary, or encrypted, for the first time. We propose Iustitia, a tool for identifying flow nature on the fly. The key observation behind Iustitia is that text flows have the lowest entropy and encrypted flows have the highest entropy, while the entropy of binary flows stands in between. The basic idea of Iustitia is to classify flows using machine learning techniques where a feature is the entropy of every certain number of consecutive bytes. The key features of Iustitia are high speed (10% of average packet inter-arrival time) and high accuracy (86%).
Amir R. Khakpour, Alex X. Liu