In this paper, we study a sequential decision making problem. The objective is to maximize the total reward while satisfying constraints, which are defined at every time step. The novelty of the setup is our assumption that the rewards and constraints are controlled by a potentially adverse opponent. To solve the problem, we propose a novel expert algorithm that guarantees a vanishing regret while violating only some bounded number of constraints. The quality of our expert solutions is evaluated on a challenging power management problem. Results of our experiments show that online learning with constraints can be carried out successfully in practice.
Shie Mannor, John N. Tsitsiklis