In this paper, we present a new technique, called Stream Projected Ouliter deTector (SPOT), to deal with outlier detection problem in high-dimensional data streams. SPOT is unique in a number of aspects. First, SPOT employs a novel window-based time model and decaying cell summaries to capture statistics from the data stream. Second, Sparse Subspace Template (SST), a set of top sparse subspaces obtained by unsupervised and/or supervised learning processes, is constructed in SPOT to detect projected outliers effectively. Multi-Objective Genetic Algorithm (MOGA) is employed as an effective search method in unsupervised learning for finding outlying subspaces from training data. Finally, SST is able to carry out online selfevolution to cope with dynamics of data streams. This paper provides details on the motivation and technical challenges of detecting outliers from high-dimensional data streams, present an overview of SPOT, and give the plans for system demonstration of SPOT.
Ji Zhang, Qigang Gao, Hai H. Wang