

Capturing, indexing, clustering, and retrieving system history

14 years 11 months ago
Capturing, indexing, clustering, and retrieving system history
We present a method for automatically extracting from a running system an indexable signature that distills the essential characteristic from a system state and that can be subjected to automated clustering and similarity-based retrieval to identify when an observed system state is similar to a previously-observed state. This allows operators to identify and quantify the frequency of recurrent problems, to leverage previous diagnostic efforts, and to establish whether problems seen at different installations of the same site are similar or distinct. We show that the naive approach to constructing these signatures based on simply recording the actual “raw” values of collected measurements is ineffective, leading us to a more sophisticated approach based on statistical modeling and inference. Our method requires only that the system’s metric of merit (such as average transaction response time) as well as a collection of lower-level operational metrics be collected, as is done b...
Ira Cohen, Steve Zhang, Moisés Goldszmidt,
Added 17 Mar 2010
Updated 17 Mar 2010
Type Conference
Year 2005
Where SOSP
Authors Ira Cohen, Steve Zhang, Moisés Goldszmidt, Julie Symons, Terence Kelly, Armando Fox
Comments (0)