Sciweavers

AAAI
2015

Optimizing the CVaR via Sampling

8 years 9 months ago
Optimizing the CVaR via Sampling
Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the gradient of the CVaR, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risksensitive controller for the game of Tetris.
Aviv Tamar, Yonatan Glassner, Shie Mannor
Added 27 Mar 2016
Updated 27 Mar 2016
Type Journal
Year 2015
Where AAAI
Authors Aviv Tamar, Yonatan Glassner, Shie Mannor
Comments (0)