Finding structure in multiple streams of data is an important problem. Consider the streams of data owing from a robot's sensors, the monitors in an intensive care unit, or periodic measurements of various indicators of the health of the economy. There is clearly utility in determining how current and past values in those streams are related to future values. We formulate the problem of nding structure in multiple streams of categorical data as search over the space of dependencies, unexpectedly frequent or infrequent co-occurrences, between complex patterns of values that can appear in the streams. Based on that formulation, we develop the Multi-Stream Dependency Detection (msdd) algorithm that performs an e cient systematic search over the space of all possible dependencies. Dependency strength is evaluated with a statistical measure of nonindependence, and bounds that we derive for the value of that measure allow the search to be pruned. Due to the pruning, msdd can nd the k s...
Tim Oates, Paul R. Cohen