Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. At each iteration observations are re-weighted using the gradient of the underlying loss function. We present an approach of weight decay for observation weights which is equivalent to "robustifying" the underlying loss function. At the extreme end of decay this approach converges to Bagging, which can be viewed as boosting with a linear underlying loss function. We illustrate the practical usefulness of weight decay for improving prediction performance and present an equivalence between one form of weight decay and "Huberizing" -- a statistical method for making loss functions more robust.