In the context of probabilistic verification, we provide a new notion of trace-equivalence divergence between pairs of Labelled Markov processes. This divergence corresponds to the...
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly a...
Many multiagent problems comprise subtasks which can be considered as reinforcement learning (RL) problems. In addition to classical temporal difference methods, evolutionary algo...
Jan Hendrik Metzen, Mark Edgington, Yohannes Kassa...
To accelerate the learning of reinforcement learning, many types of function approximation are used to represent state value. However function approximation reduces the accuracy o...
— This paper presents a new reinforcement learning algorithm for accelerating acquisition of new skills by real mobile robots, without requiring simulation. It speeds up Q-learni...