Measurement, collection, and interpretation of network usage data commonly involves multiple stage of sampling and aggregation. Examples include sampling packets, aggregating them into flow statistics at a router, sampling and aggregation of usage records in a network data repository for reporting, query and archiving. Although unbiased estimates of packet, bytes and flows usage can be formed for each sampling operation, for many applications it is crucial to know the inherent estimation error. Previous work in this area has been limited mainly to analyzing the estimator variance for particular methods, e.g., independent packet sampling. However, the variance is of limited use for more general sampling methods, where the estimate may not be well approximated by a Gaussian distribution. This motivates our paper, in which we establish Chernoff bounds on the likelihood of estimation error in a general multistage combination of measurement sampling and aggregation. We derive the scale aga...
Edith Cohen, Nick G. Duffield, Carsten Lund, Mikke