Sciweavers

EUROSYS
2010
ACM

Fingerprinting the Datacenter: Automated Classification of Performance Crises

14 years 4 months ago
Fingerprinting the Datacenter: Automated Classification of Performance Crises
When a performance crisis occurs in a datacenter, rapid recovery requires quickly recognizing whether a similar incident occurred before, in which case a known remedy may apply, or whether the problem is new, in which case new troubleshooting is necessary. To address this issue we propose a new and efficient representation of the datacenter's state, a fingerprint, that scales linearly with the number of performance metrics considered and it is not affected by the number of machines. These fingerprints are generated online and then used as unique identifiers of the different types of performance crises so that we can effectively recognize previous occurrences and retrieve repair actions. We evaluate our approach on a production datacenter with hundreds of machines running a 24x7 enterprise-class user-facing application, verifying each identification result with the operators of the datacenter and trouble-shooting tickets. Our approach has 80% identification accuracy in the operati...
Peter Bodik, Moises Goldszmidt, Armando Fox, Dawn
Added 09 Aug 2010
Updated 09 Aug 2010
Type Conference
Year 2010
Where EUROSYS
Authors Peter Bodik, Moises Goldszmidt, Armando Fox, Dawn B. Woodard, Hans Andersen
Comments (0)