Mining data streams of changing class distributions is important for real-time business decision support. The stream classifier must evolve to reflect the current class distribution. This poses a serious challenge. On the one hand, relying on historical data may increase the chances of learning obsolete models. On the other hand, learning only from the latest data may lead to biased classifiers, as the latest data is often an unrepresentative sample of the current class distribution. The problem is particularly acute in classifying rare events, when, for example, instances of the rare class do not even show up in the most recent training data. In this paper, we use a stochastic model to describe the concept shifting patterns and formulate this problem as an optimization one: from the historical and the current training data that we have observed, find the most-likely current distribution, and learn a classifier based on the most-likely distribution. We derive an analytic solution and ...
Haixun Wang, Jian Yin, Jian Pei, Philip S. Yu, Jef