Mining frequent itemsets in data streams is beneficial to many real-world applications but is also a challenging task since data streams are unbounded and have high arrival rates. Moreover, the distribution of data streams can change over time, which makes the task of maintaining frequent itemsets even harder. In this paper, we propose a falsenegative oriented algorithm, called TWIM, that can find most of the frequent itemsets, detect distribution changes, and update the mining results accordingly. Experimental results show that our algorithm performs as good as other false-negative algorithms on data streams without distribution change, and has the ability to detect changes over time-varying data streams in real-time with a high accuracy rate. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications – Data Mining General Terms Algorithms, Performance, Experimentation Keywords Data stream, Frequent itemset
Yingying Tao, M. Tamer Özsu