Performing QoS (Quality of Service) control in large computing systems requires an on line metric that is representative of the real state of the system. The Tardiness Quantile Metric (TQM) introduced earlier allows control of QoS by measuring efficiently how close to the specified QoS the system is, assuming specific distributions. In this paper we generalize this idea and propose the Generalized Tardiness Quantile Metric (GTQM). By using an online convergent sequential process, defined from a Markov chain, we derive quantile estimations that do not depend on the shape of the workload probability distribution. We then use GTQM to keep QoS controlled in a fine grain manner, saving energy in soft real-time web clusters. To evaluate the new metric, we show practical results in a real web cluster running Linux, Apache, and MySQL, with our QoS control and for both a deterministic workload and an e-commerce workload. The results show that the GTQM method has excellent workload prediction c...
Luciano Bertini, Julius C. B. Leite, Daniel Moss&e