We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
This paper presents properties and results of a new framework for sequential decision-making in multiagent settings called interactive partially observable Markov decision process...
In this paper, we study a particular subclass of partially observable models, called quasi-deterministic partially observable Markov decision processes (QDET-POMDPs), characterize...
We study the computational complexity of some central analysis problems for One-Counter Markov Decision Processes (OC-MDPs), a class of finitely-presented, countable-state MDPs. O...
Tomas Brazdil, Vaclav Brozek, Kousha Etessami, Ant...
We consider sensor scheduling as the optimal observability problem for partially observable Markov decision processes (POMDP). This model fits to the cases where a Markov process ...