Partially observable Markov decision processes (POMDPs) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect ...
— We consider decision making in a Markovian setup where the reward parameters are not known in advance. Our performance criterion is the gap between the performance of the best ...
—For software, the costs of failures are not clearly understood. Often, these costs disappear in the costs of testing, the general developments costs, or the operating expenses. ...
Abstract. Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capaci...
Trevor Walker, Lisa Torrey, Jude W. Shavlik, Richa...
We consider a multi-channel opportunistic communication system where the states of these channels evolve as independent and statistically identical Markov chains (the Gilbert-Elli...