In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in tha...
Allocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute acti...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Conventional approaches to 3D scene reconstruction often treat matting and reconstruction as two separate problems, with matting a prerequisite to reconstruction. The problem with...
Jean-Yves Guillemaut, Adrian Hilton, Jonathan Star...
In this paper, we discuss the use of Targeted Trajectory Distribution Markov Decision Processes (TTD-MDPs)—a variant of MDPs in which the goal is to realize a specified distrib...
Sooraj Bhat, David L. Roberts, Mark J. Nelson, Cha...