In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A play...
Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Z...
We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our cr...
A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much ...
This paper addresses the problem of scheduling jobs in soft real-time systems, where the utility of completing each job decreases over time. We present a utility-based framework fo...
One of the most general frameworks for phrasing control problems for complex, redundant robots is operational space control. However, while this framework is of essential importan...