Today’s distributed systems need runtime error detection to catch errors arising from software bugs, hardware errors, or unexpected operating conditions. A prominent class of error detection techniques operates in a stateful manner, i.e., it keeps track of the state of the application being monitored and then matches state-based rules. Large-scale distributed applications generate a high volume of messages that can overwhelm the capacity of a stateful detection system. An existing approach to handle this is to randomly sample the messages and process a subset. However, this approach, leads to non-determinism with respect to the detection system’s view of what state the application is in. This in turn leads to degradation in the quality of detection. We present an intelligent sampling algorithm and a Hidden Markov Model (HMM)-based algorithm to select the messages that the detection system processes and determine the application states such that the non-determinism is minimized. We ...
Ignacio Laguna, Fahad A. Arshad, David M. Grothe,