Distributing scarce resources among agents in a way that maximizes the social welfare of the group is a computationally hard problem when the value of a resource bundle is not lin...
When modeling real-world decision-theoretic planning problems in the Markov decision process (MDP) framework, it is often impossible to obtain a completely accurate estimate of tr...
Karina Valdivia Delgado, Scott Sanner, Leliane Nun...
— In this paper we address the reliability of policies derived by Reinforcement Learning on a limited amount of observations. This can be done in a principled manner by taking in...
Howard's policy iteration algorithm is one of the most widely used algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to we...
Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration b...