When something unexpected happens in a large production system, administrators must first perform a search to isolate which components and component interactions are likely to be involved. The system may consist of thousands of interacting subsystems, the logging instrumentation may be noisy or incomplete, and the problem description may be vague, so this search is often the most difficult part of understanding the system’s behavior. To facilitate the search process, we present a query language and a method for computing these queries that makes minimal assumptions about the available
Adam J. Oliner, Alex Aiken