Sciweavers

60 search results - page 9 / 12
» Iteratively Extending Time Horizon Reinforcement Learning
Sort
View
EWRL
2008
13 years 9 months ago
Markov Decision Processes with Arbitrary Reward Processes
Abstract. We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily ove...
Jia Yuan Yu, Shie Mannor, Nahum Shimkin
ATAL
2006
Springer
13 years 11 months ago
Learning to cooperate in multi-agent social dilemmas
In many Multi-Agent Systems (MAS), agents (even if selfinterested) need to cooperate in order to maximize their own utilities. Most of the multi-agent learning algorithms focus on...
Jose Enrique Munoz de Cote, Alessandro Lazaric, Ma...
ESANN
2007
13 years 9 months ago
The Recurrent Control Neural Network
This paper presents our Recurrent Control Neural Network (RCNN), which is a model-based approach for a data-efficient modelling and control of reinforcement learning problems in di...
Anton Maximilian Schäfer, Steffen Udluft, Han...
COLT
2008
Springer
13 years 9 months ago
Adapting to a Changing Environment: the Brownian Restless Bandits
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
Aleksandrs Slivkins, Eli Upfal
IOR
2010
99views more  IOR 2010»
13 years 6 months ago
Dynamic Pricing with a Prior on Market Response
We study a problem of dynamic pricing faced by a vendor with limited inventory, uncertain about demand, aiming to maximize expected discounted revenue over an infinite time horiz...
Vivek F. Farias, Benjamin Van Roy