This paper describes our work in learning online models that forecast real-valued variables in a high-dimensional space. A 3GB database was collected by sampling 421 real-valued sensors in a cement manufacturing plant, once every minute, for several months. The goal is to learn models that, every minute, forecast the values of all 421 sensors for the next hour. The underlying process is highly non-stationary: there are abrupt changes in sensor behavior (time-frame: minutes), semi-periodic behavior (time-frame: hours{days), and slow long-term drift in plant dynamics (timeframe: weeks{months). Therefore, the models need to adapt on-line as new data is received all learning and prediction must occur in realtime (i.e., one minute). The learning methods must also deal with two forms of data corruption: large amounts of data are missing, and what is recorded is very noisy. We have developed a framework with multiple levels of adaptation in which several thousand incremental learning algorit...
R. Bharat Rao, Scott Rickard, Frans Coetzee