

Log summarization and anomaly detection for troubleshooting distributed systems

14 years 6 months ago
Log summarization and anomaly detection for troubleshooting distributed systems
— Today’s system monitoring tools are capable of detecting system failures such as host failures, OS errors, and network partitions in near-real time. Unfortunately, the same cannot yet be said of the end-to-end distributed software stack. Any given action, for example, reliably transferring a directory of files, can involve a wide range of complex and interrelated actions across multiple pieces of software: checking user certificates and permissions, getting details for all files, performing third-party transfers, understanding re-try policy decisions, etc. We present an infrastructure for troubleshooting complex middleware, a general purpose technique for configurable log summarization, and an anomaly detection technique that works in near-real time on running Grid middleware. We present results gathered using this infrastructure from instrumented Grid middleware and applications running on the Emulab testbed. From these results, we analyze the effectiveness of several algori...
Dan Gunter, Brian Tierney, Aaron Brown, D. Martin
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where GRID
Authors Dan Gunter, Brian Tierney, Aaron Brown, D. Martin Swany, John Bresnahan, Jennifer M. Schopf
Comments (0)