Learning from software failures is an essential step towards the development of more reliable software systems and processes. However, as more intricate software systems are developed, determining the nature and causes of a software failure becomes a great challenge. And although many existing techniques can help to understand the nature of the failure, they are limited in some of the following aspects. First, they work only within controlled environments. Second, they have a major impact on the target system behavior. Third, they assume that a failure can be reproduced. Fourth, they lack enough support to carry out a structured failure analysis. In this paper, we present the Software Black Box (SBB) as an alternative mechanism for failure investigation. The SBB is different from its predecessors in that it was specifically designed to be embedded in a target system and assist in the investigation of failures by reconstructing the events that lead to the failure. The SBB architecture ...
Sebastian G. Elbaum, John C. Munson