It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortun...
Satinder P. Singh, Tommi Jaakkola, Michael I. Jord...
Abstract. The convex optimisation problem involved in fitting a kernel probit regression (KPR) model can be solved efficiently via an iteratively re-weighted least-squares (IRWLS)...
Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of ...
Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...
Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space defined by ...