Sciweavers

118 search results - page 20 / 24
» An Evolutionary Random Policy Search Algorithm for Solving M...
Sort
View
CDC
2010
IEEE
136views Control Systems» more  CDC 2010»
13 years 2 months ago
Pathologies of temporal difference methods in approximate dynamic programming
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...
Dimitri P. Bertsekas
WECWIS
2005
IEEE
141views ECommerce» more  WECWIS 2005»
14 years 1 months ago
An Adaptive Bilateral Negotiation Model for E-Commerce Settings
This paper studies adaptive bilateral negotiation between software agents in e-commerce environments. Specifically, we assume that the agents are self-interested, the environment...
Vidya Narayanan, Nicholas R. Jennings
ICML
2007
IEEE
14 years 8 months ago
Multi-task reinforcement learning: a hierarchical Bayesian approach
We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknow...
Aaron Wilson, Alan Fern, Soumya Ray, Prasad Tadepa...
ICML
2008
IEEE
14 years 8 months ago
Reinforcement learning in the presence of rare events
We consider the task of reinforcement learning in an environment in which rare significant events occur independently of the actions selected by the controlling agent. If these ev...
Jordan Frank, Shie Mannor, Doina Precup
QUESTA
2010
112views more  QUESTA 2010»
13 years 5 months ago
Admission control for a multi-server queue with abandonment
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. O...
Yasar Levent Koçaga, Amy R. Ward