This paper describes a measurement infrastructure used to collect detailed IP traffic measurements from an IP backbone. Usage, i.e, bytes transmitted, is determined from raw NetFlow records generated by the backbone routers. The amount of raw data is immense. Two types of data sampling in order to manage data volumes: (i) (packet) sampled NetFlow in the routers; (ii) sizedependent sampling of NetFlow records. Furthermore, dropping of NetFlow records in transmission can be regarded as an uncontrolled form of sampling. We show how to manage the trade-off between estimation accuracy and data volume. Firstly, we describe the sampling error that arises from all three types of sampling when estimating usage per traffic class: how it can be predicted from models and raw data, and how it can be estimated directly from the sampled data itself. Secondly, we show how to determined the usage of resources— bandwidth, computational cycle, storage—within the components of the infrastructure. T...
Nick G. Duffield, Carsten Lund