Sciweavers

81 search results - page 14 / 17
» The Optimal Reward Baseline for Gradient-Based Reinforcement...
Sort
View
JMLR
2010
119views more  JMLR 2010»
13 years 2 months ago
A Convergent Online Single Time Scale Actor Critic Algorithm
Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their gen...
Dotan Di Castro, Ron Meir
PKDD
2010
Springer
164views Data Mining» more  PKDD 2010»
13 years 5 months ago
Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations
Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightf...
Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mik...
ICAC
2005
IEEE
14 years 1 months ago
Self-Optimizing Architecture for QoS Provisioning in Differentiated Services
This paper presents a scalable and self-optimizing architecture for Quality-of-Service (QoS) provisioning in the Differentiated Services (DiffServ) framework. The proposed archite...
Daniel Yagan, Chen-Khong Tham
ML
2002
ACM
133views Machine Learning» more  ML 2002»
13 years 7 months ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...
SASO
2009
IEEE
14 years 2 months ago
Distributed W-Learning: Multi-Policy Optimization in Self-Organizing Systems
—Large-scale agent-based systems are required to self-optimize towards multiple, potentially conflicting, policies of varying spatial and temporal scope. As a result, not all ag...
Ivana Dusparic, Vinny Cahill