The performance of computer systems depends, among other things, on the workload. Performance evaluations are therefore often done using logs of workloads on current productions systems, under the assumption that such real workloads are representative and reliable; likewise, workload modeling is typically based on real workloads. We show, however, that real workloads may also contain anomalies that make them non-representative and unreliable. This is a special case of multi-class workloads, where one class is the “real” workload which we wish to use in the evaluation, and the other class contaminates the log with “bogus” data. We provide several examples of this situation, including a previously unrecognized type of anomaly we call “workload flurries”: surges of activity with a repetitive nature, caused by a single user, that dominate the workload for a relatively short period. Using a workload with such anomalies in effect emphasizes rare and unique events (e.g. occurrin...
Dror G. Feitelson, Dan Tsafrir