We consider algorithms for combining advice from a set of experts. In each trial, the algorithm receives the predictions of the experts and produces its own prediction. A loss function is applied to measure the discrepancy between the predictions and actual observations. The algorithm keeps a weight for each expert. At each trial the weights are first used to help produce the prediction and then updated according to the observed outcome. Our starting point is Vovk’s Aggregating Algorithm, in which the weights have a simple form: the weight of an expert decreases exponentially as a function of the loss incurred by the expert. The prediction of the Aggregating Algorithm is typically a nonlinear function of the weights and the experts’ predictions. We analyze here a simplified algorithm in which the weights are as in the original Aggregating Algorithm, but the prediction is simply the weighted average of the experts’ predictions. We show that for a large class of loss functions, e...
Jyrki Kivinen, Manfred K. Warmuth