We introduce the ALeRT (Action-dependent Learning Rates with Trends) algorithm that makes two modifications to the learning rate and one change to the exploration rate of traditio...
Maria Cutumisu, Duane Szafron, Michael H. Bowling,...
Single-agent reinforcement learners in time-extended domains and multi-agent systems share a common dilemma known as the credit assignment problem. Multi-agent systems have the st...
—Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input– ...
In real world scenes, objects to be classified are usually not visible from every direction, since they are almost always positioned on some kind of opaque plane. When moving a cam...
Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framew...