Streams of data often originate from many distributed sources. A distributed stream processing system publishes such streams of data and enables queries over the streams. This allows users to retrieve and relate data from the distributed streams without needing to know where they are located. Stream data is important not only for its current values but also for past values produced. In order to support this, the history of the stream must be archived and stream processing systems must support history queries. However, one problem which then arises is that data streams published by distributed sources may have missing data values, e.g. due to a network failure. Since the stream has missed some values, the stored history of the stream contains gaps. This paper considers the effects of missing information on the answers generated for history queries. The assumptions about the data streams are analysed so that techniques for detecting missing values can be developed. A model for represent...
Alasdair J. G. Gray, Werner Nutt, M. Howard Willia