Sciweavers

ICML
2010
IEEE

Detecting Large-Scale System Problems by Mining Console Logs

13 years 12 months ago
Detecting Large-Scale System Problems by Mining Console Logs
Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We use a combination of program analysis and information retrieval techniques to transform free-text console logs into numerical features, which captures sequences of events in the system. We then analyze these features using machine learning to detect operational problems. We also show how to distill the results of our analysis to an operatorfriendly one-page decision tree showing the critical messages associated with the detected problems. In addition, we extend our methods to online problem detection where the sequences of events are continuously generated as data streams.
Wei Xu, Ling Huang, Armando Fox, David Patterson,
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where ICML
Authors Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael I. Jordan
Comments (0)