Many of the data sources used in stream query processing are known to exhibit bursty behavior. Data in a burst often has different characteristics than steady-state data, and therefore may be of particular interest. In this paper, we describe the Data Triage architecture that we have added to TelegraphCQ to react to such bursts. When a burst of data requires load-shedding, TelegraphCQ chooses tuples to remove from the data flow. The Data Triage component then constructs synopses of these tuples and uses a fast but approximate shadow query plan to estimate the query results that the system did not have time to compute. These results are then combined with the system's standard query results to capture the properties of the entire input. We describe how we leveraged the object-relational features of TelegraphCQ to implement Data Triage entirely outside the system's standard-case query engine. Through a series of experiments using real-time measurements of the system, we show t...
Frederick Reiss, Joseph M. Hellerstein