Packet sampling is widely used in network monitoring. Sampled packet streams are often used to determine flow-level statistics of network traffic. To date there is conflicting evidence on the quality of the resulting estimates. In this paper we take a systematic approach, using the Fisher information metric and the Cram´er-Rao bound, to understand the contributions that different types of information within sampled packets have on the quality of flow-level estimates. We provide concrete evidence that, without protocol information and with packet sampling rate p = 0.005, any accurate unbiased estimator needs approximately 1016 sampled flows. The required number of sampled flows drops to roughly 104 with the use of TCP sequence numbers. Furthermore, additional SYN flag information significantly reduces the estimation error of short flows. We present a Maximum Likelihood Estimator (MLE) that relies on all of this information and show that it is efficient, even when applied to a...
Bruno F. Ribeiro, Donald F. Towsley, Tao Ye, Jean