We consider the problem of online learning in settings in which we want to compete not simply with the rewards of the best expert or stock, but with the best trade-off between rewards and risk. Motivated by finance applications, we consider two common measures balancing returns and risk: the Sharpe ratio [7] and the mean-variance criterion of Markowitz [6]. We first provide negative results establishing the impossibility of no-regret algorithms under these measures, thus providing a stark contrast with the returns-only setting. We then show that the recent algorithm of Cesa-Bianchi et al. [3] achieves nontrivial performance under a modified bicriteria risk-return measure, and also give a no-regret algorithm for a “localized” version of the mean-variance criterion. To our knowledge this paper initiates the investigation of explicit risk considerations in the standard models of worst-case online learning.
Eyal Even-Dar, Michael J. Kearns, Jennifer Wortman