In a facility with front room and back room operations, it is useful to switch workers between the rooms in order to cope with changing customer demand. Assuming stochastic custome...
MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start s...
H. Brendan McMahan, Maxim Likhachev, Geoffrey J. G...
Some of the most successful recent applications of reinforcement learning have used neural networks and the TD algorithm to learn evaluation functions. In this paper, we examine t...
In many Multi-Agent Systems (MAS), agents (even if selfinterested) need to cooperate in order to maximize their own utilities. Most of the multi-agent learning algorithms focus on...
Jose Enrique Munoz de Cote, Alessandro Lazaric, Ma...
This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. W...