The Application of AdaBoost for Distributed, Scalable and On-Line Learning

14 years 8 months ago

Download www.weifan.info

We propose to use AdaBoost to eﬃciently learn classiﬁers over very large and possibly distributed data sets that cannot ﬁt into main memory, as well as on-line learning where new data become available periodically. We propose two new ways to apply AdaBoost. The ﬁrst allows the use of a small sample of the weighted training set to compute a weak hypothesis. The second approach involves using AdaBoost as a means to re-weight classiﬁers in an ensemble, and thus to reuse previously computed classiﬁers along with new classiﬁer computed on a new increment of data. These two techniques of using AdaBoost provide scalable, distributed and on-line learning. We discuss these methods and their implementation in JAM, an agent-based learning system. Empirical studies on four real world and artiﬁcal data sets have shown results that are either comparable to or better than learning classiﬁers over the complete training set and, in some cases, are comparable to boosting on the comple...

Wei Fan, Salvatore J. Stolfo, Junxin Zhang

Real-time Traffic