When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no s...
The problem of coalition formation when agents are uncertain about the types or capabilities of their potential partners is a critical one. In [3] a Bayesian reinforcement learnin...
We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geom...
Many algorithms such as Q-learning successfully address reinforcement learning in single-agent multi-time-step problems. In addition there are methods that address reinforcement l...
We target the problem of closed-loop learning of control policies that map visual percepts to continuous actions. Our algorithm, called Reinforcement Learning of Joint Classes (RLJ...