Parametric regret in uncertain Markov decision processes

15 years 6 months ago

Download www.cim.mcgill.ca

— We consider decision making in a Markovian setup where the reward parameters are not known in advance. Our performance criterion is the gap between the performance of the best strategy that is chosen after the true parameter realization is revealed and the performance of the strategy that is chosen before the parameter realization is revealed. We call this gap the parametric regret. We consider two related problems: minimax regret and mean-variance tradeoff of the regret. The minimax regret strategy minimizes the worst-case regret under the most adversarial possible realization. We show that the problem of computing the minimax regret strategy is NP-hard and propose algorithms to efﬁciently solve it under favorable conditions. The mean-variance tradeoff formulation requires a probabilistic model of the uncertain parameters and looks for a strategy that minimizes a convex combination of the mean and the variance of the regret. We prove that computing such a strategy can be done nu...

Huan Xu, Shie Mannor

Real-time Traffic