Sciweavers

473 search results - page 73 / 95
» Optimal policy switching algorithms for reinforcement learni...
Sort
View
CORR
2011
Springer
209views Education» more  CORR 2011»
12 years 11 months ago
Close the Gaps: A Learning-while-Doing Algorithm for a Class of Single-Product Revenue Management Problems
In this work, we consider a retailer selling a single product with limited on-hand inventory over a finite selling season. Customer demand arrives according to a Poisson process,...
Zizhuo Wang, Shiming Deng, Yinyu Ye
ICML
2009
IEEE
14 years 8 months ago
Regularization and feature selection in least-squares temporal difference learning
We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (L...
J. Zico Kolter, Andrew Y. Ng
JSAC
2011
159views more  JSAC 2011»
13 years 2 months ago
An Anti-Jamming Stochastic Game for Cognitive Radio Networks
—Various spectrum management schemes have been proposed in recent years to improve the spectrum utilization in cognitive radio networks. However, few of them have considered the ...
Beibei Wang, Yongle Wu, K. J. Ray Liu, T. Charles ...
ICML
2001
IEEE
14 years 8 months ago
Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning
This paper examines the notion of symmetry in Markov decision processes (MDPs). We define symmetry for an MDP and show how it can be exploited for more effective learning in singl...
Martin Zinkevich, Tucker R. Balch
NIPS
2008
13 years 9 months ago
Goal-directed decision making in prefrontal cortex: a computational framework
Research in animal learning and behavioral neuroscience has distinguished between two forms of action control: a habit-based form, which relies on stored action values, and a goal...
Matthew Botvinick, James An