In partially observable worlds with many agents, nested beliefs are formed when agents simultaneously reason about the unknown state of the world and the beliefs of the other agen...
Luke S. Zettlemoyer, Brian Milch, Leslie Pack Kael...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
This paper uses partially observable Markov decision processes (POMDP’s) as a basic framework for MultiAgent planning. We distinguish three perspectives: first one is that of a...
Bharaneedharan Rathnasabapathy, Piotr J. Gmytrasie...
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian...
Joel Veness, Kee Siong Ng, Marcus Hutter, David Si...
Abstract. Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free m...