There exist a number of reinforcement learning algorithms which learn by climbing the gradient of expected reward. Their long-run convergence has been proved, even in partially ob...
— Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle realworld sequential decision processes but require a known model to be solv...
In this paper, we investigate the use of reinforcement learning (RL) techniques to the problem of determining dynamic prices in an electronic retail market. As representative mode...
Ad-hoc Grids are highly heterogeneous and dynamic networks, one of the main challenges of resource allocation in such environments is to find mechanisms which do not rely on the ...