For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar components. Examples of such problems i...
Abstract. Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free m...
We present a new algorithm, GM-Sarsa(0), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processe...