Search Sciweavers | Sciweavers

69 search results - page 9 / 14

» PAC-Bayesian Policy Evaluation for Reinforcement Learning

174

Voted

NIPS
2008

165views Information Technology» more NIPS 2008»

Regularized Policy Iteration

15 years 7 months ago

Download webdocs.cs.ualberta.ca

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use o...

Amir Massoud Farahmand, Mohammad Ghavamzadeh, Csab...

claim paper

Read More »

142

Voted

ECAI
2008
Springer

124views Artificial Intelligence» more ECAI 2008»

Exploiting locality of interactions using a policy-gradient approach in multiagent learning

15 years 7 months ago

Download gaips.inesc-id.pt

In this paper, we propose a policy gradient reinforcement learning algorithm to address transition-independent Dec-POMDPs. This approach aims at implicitly exploiting the locality...

Francisco S. Melo

claim paper

Read More »

170

click to vote

SASO
2009
IEEE

172views Control Systems» more SASO 2009»

Distributed W-Learning: Multi-Policy Optimization in Self-Organizing Systems

16 years 10 days ago

Download www.scss.tcd.ie

—Large-scale agent-based systems are required to self-optimize towards multiple, potentially conﬂicting, policies of varying spatial and temporal scope. As a result, not all ag...

Ivana Dusparic, Vinny Cahill

claim paper

Read More »

214

click to vote

ESANN
2008

278views Neural Networks» more ESANN 2008»

Learning to play Tetris applying reinforcement learning methods

15 years 7 months ago

Download www.dice.ucl.ac.be

In this paper the application of reinforcement learning to Tetris is investigated, particulary the idea of temporal difference learning is applied to estimate the state value funct...

Alexander Groß, Jan Friedland, Friedhelm Sch...

claim paper

Read More »

165

click to vote

ICML
2010
IEEE

231views Machine Learning» more ICML 2010»

Toward Off-Policy Learning Control with Function Approximation

15 years 6 months ago

Download www.sztaki.hu

We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...

Hamid Reza Maei, Csaba Szepesvári, Shalabh ...

claim paper

Read More »

« Prev « First page 9 / 14 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers