Sciweavers

SIGSOFT
2010
ACM

Finding latent performance bugs in systems implementations

13 years 9 months ago
Finding latent performance bugs in systems implementations
Robust distributed systems commonly employ high-level recovery mechanisms enabling the system to recover from a wide variety of problematic environmental conditions such as node failures, packet drops and link disconnections. Unfortunately, these recovery mechanisms also effectively mask additional serious design and implementation errors, disguising them as latent performance bugs that severely degrade end-to-end system performance. These bugs typically go unnoticed due to the challenge of distinguishing between a bug and an intermittent environmental condition that must be tolerated by the system. We present techniques that can automatically pinpoint latent performance bugs in systems implementations, in the spirit of recent advances in model checking by systematic state space exploration. The techniques proceed by automating the process of conducting random simulations, identifying performance anomalies, and analyzing anomalous executions to pinpoint the circumstances leading to pe...
Charles Edwin Killian, Karthik Nagaraj, Salman Per
Added 15 Feb 2011
Updated 15 Feb 2011
Type Journal
Year 2010
Where SIGSOFT
Authors Charles Edwin Killian, Karthik Nagaraj, Salman Pervez, Ryan Braud, James W. Anderson, Ranjit Jhala
Comments (0)