Decentralized partially observable Markov decision processes (DEC-POMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and part...
Matthijs T. J. Spaan, Geoffrey J. Gordon, Nikos A....
Filtering denotes any method whereby an agent updates its belief state—its knowledge of the state of the world—from a sequence of actions and observations. In logical filterin...
Reasoning about agents that we observe in the world is challenging. Our available information is often limited to observations of the agent’s external behavior in the past and p...
H. Van Dyke Parunak, Sven Brueckner, Robert S. Mat...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Failure detectors are a service that provides (approximate) information about process crashes in a distributed system. The well-known “eventually perfect” failure detector, 3P...