We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications...
This paper proposes an Integrated MDP and POMDP Learning AgeNT (IMPLANT) architecture for adaptation in modern games. The modern game world basically involves a human player actin...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Many complex control problems require sophisticated solutions that are not amenable to traditional controller design. Not only is it difficult to model real world systems, but oft...
Predictive state representation (PSR) models for controlled dynamical systems have recently been proposed as an alternative to traditional models such as partially observable Mark...
Michael R. James, Satinder P. Singh, Michael L. Li...