We consider boosting algorithms that maintain a distribution over a set of examples. At each iteration a weak hypothesis is received and the distribution is updated. We motivate these updates as minimizing the relative entropy to the last distribution subject to linear constraints. For example AdaBoost constrains the edge of the last hypothesis w.r.t. the updated distribution to be at most = 0 (equivalently its weighted error is constrained to be half). In some sense, AdaBoost is "corrective" w.r.t. the last hypothesis. A more principled Boosting method is to be "totally corrective" in that the edges of all the past hypotheses are constrained to be at most , where is suitably adapted. Using new techniques, we prove the same iteration bounds for the totally corrective algorithms as for their corrective versions. Moreover with adaptive , the algorithms can be shown to provably maximize the margin. Experimentally, the totally corrective versions clearly outperform t...
Gunnar Rätsch, Jun Liao, Manfred K. Warmuth