This paper describes a method of applying a reinforcement learning artificial intelligence to categorize audio files by mood based on listener response during a performance. The s...
In the context of probabilistic verification, we provide a new notion of trace-equivalence divergence between pairs of Labelled Markov processes. This divergence corresponds to the...
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly a...
Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...
We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intra...
Pedro Alejandro Ortega, Daniel Alexander Braun, Si...