We consider the problem of identifying periodic trends in data streams. We say a signal a ∈ Rn is p-periodic if ai = ai+p for all i ∈ [n − p]. Recently, Erg¨un et al. [4] presented a one-pass, O(polylog n)space algorithm for identifying the smallest period of a signal. Their algorithm required a to be presented in the time-series model, i.e., ai is the ith element in the stream. We present a more general linear sketch algorithm that has the advantages of being applicable to a) the turnstile stream model, where coordinates can be incremented/decremented in an arbitrary fashion and b) the parallel or distributed setting where the signal is distributed over multiple locations/machines. We also present sketches for (1+ ) approximating the 2 distance between a and the nearest p-periodic signal for a given p. Our algorithm uses O( −2 polylog n) space, comparing favorably to an earlier time-series result that used O( −5.5√ p polylog n) space for estimating the Hamming distance to...
Michael S. Crouch, Andrew McGregor