Abstract. We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily ove...
In many Multi-Agent Systems (MAS), agents (even if selfinterested) need to cooperate in order to maximize their own utilities. Most of the multi-agent learning algorithms focus on...
Jose Enrique Munoz de Cote, Alessandro Lazaric, Ma...
This paper presents our Recurrent Control Neural Network (RCNN), which is a model-based approach for a data-efficient modelling and control of reinforcement learning problems in di...
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
We study a problem of dynamic pricing faced by a vendor with limited inventory, uncertain about demand, aiming to maximize expected discounted revenue over an infinite time horiz...