We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning where new data become available periodically. We propose two new ways to apply AdaBoost. The first allows the use of a small sample of the weighted training set to compute a weak hypothesis. The second approach involves using AdaBoost as a means to re-weight classifiers in an ensemble, and thus to reuse previously computed classifiers along with new classifier computed on a new increment of data. These two techniques of using AdaBoost provide scalable, distributed and on-line learning. We discuss these methods and their implementation in JAM, an agent-based learning system. Empirical studies on four real world and artifical data sets have shown results that are either comparable to or better than learning classifiers over the complete training set and, in some cases, are comparable to boosting on the comple...
Wei Fan, Salvatore J. Stolfo, Junxin Zhang