We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooperati...
Jayakrishnan Unnikrishnan, Venugopal V. Veeravalli
Integrating volatile renewable energy resources into the bulk power grid is challenging, due to the reliability requirement that at each instant the load and generation in the syst...
We introduce TiMDPpoly, an algorithm designed to solve planning problems with durative actions, under probabilistic uncertainty, in a non-stationary, continuous-time context. Miss...
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
This paper summarizes research on a new emerging framework for learning to plan using the Markov decision process model (MDP). In this paradigm, two approaches to learning to plan...
Sridhar Mahadevan, Sarah Osentoski, Jeffrey Johns,...