Sciweavers

128 search results - page 23 / 26
» Hierarchically Optimal Average Reward Reinforcement Learning
Sort
View
CORR
2010
Springer
143views Education» more  CORR 2010»
13 years 4 months ago
The Non-Bayesian Restless Multi-Armed Bandit: a Case of Near-Logarithmic Regret
In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A play...
Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Z...
NIPS
2004
13 years 9 months ago
New Criteria and a New Algorithm for Learning in Multi-Agent Systems
We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our cr...
Rob Powers, Yoav Shoham
IJCAI
2007
13 years 9 months ago
Using Linear Programming for Bayesian Exploration in Markov Decision Processes
A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much ...
Pablo Samuel Castro, Doina Precup
AUSAI
2005
Springer
14 years 1 months ago
Adaptive Utility-Based Scheduling in Resource-Constrained Systems
This paper addresses the problem of scheduling jobs in soft real-time systems, where the utility of completing each job decreases over time. We present a utility-based framework fo...
David Vengerov
IJRR
2008
139views more  IJRR 2008»
13 years 7 months ago
Learning to Control in Operational Space
One of the most general frameworks for phrasing control problems for complex, redundant robots is operational space control. However, while this framework is of essential importan...
Jan Peters, Stefan Schaal