Abstract. Recently, some non-regular subclasses of context-free grammars have been found to be efficiently learnable from positive data. In order to use these efficient algorithms ...
Previous research indicates that the human decision-making process is somewhat nonlinear and that nonlinear models would be more suitable than linear models for developing advance...
For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm--Delayed Q-Learning. We prove it is PAC, achieving near o...
Alexander L. Strehl, Lihong Li, Eric Wiewiora, Joh...
How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exp...
— While the Partially Observable Markov Decision Process (POMDP) provides a formal framework for the problem of robot control under uncertainty, it typically assumes a known and ...