Flexible general purpose robots need to tailor their visual processing to their task, on the fly. We propose a new approach to this within a planning framework, where the goal is ...
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estim...
This paper considers a scenario in which a secondary user makes opportunistic use of a channel allocated to some primary network. The primary network operates in a time-slotted ma...
Anh Tuan Hoang, Ying-Chang Liang, David Tung Chong...
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Inter...
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...