We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
The learning classifier system XCS is an iterative rulelearning system that evolves rule structures based on gradient-based prediction and rule quality estimates. Besides classifi...
Martin V. Butz, Pier Luca Lanzi, Stewart W. Wilson
We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the...
Ronald Parr, Lihong Li, Gavin Taylor, Christopher ...
Abstract-- We consider reinforcement learning, and in particular, the Q-learning algorithm in large state and action spaces. In order to cope with the size of the spaces, a functio...