Search Sciweavers | Sciweavers

69 search results - page 7 / 14

» PAC-Bayesian Policy Evaluation for Reinforcement Learning

160

click to vote

GECCO
2006
Springer

133views Optimization» more GECCO 2006»

On-line evolutionary computation for reinforcement learning in stochastic domains

15 years 9 months ago

Download userweb.cs.utexas.edu

In reinforcement learning, an agent interacting with its environment strives to learn a policy that specifies, for each state it may encounter, what action to take. Evolutionary c...

Shimon Whiteson, Peter Stone

claim paper

Read More »

202

click to vote

IJCAI
2007

179views Artificial Intelligence» more IJCAI 2007»

Heuristic Selection of Actions in Multiagent Reinforcement Learning

15 years 7 months ago

Download www.ijcai.org

This work presents a new algorithm, called Heuristically Accelerated Minimax-Q (HAMMQ), that allows the use of heuristics to speed up the wellknown Multiagent Reinforcement Learni...

Reinaldo A. C. Bianchi, Carlos H. C. Ribeiro, Anna...

claim paper

Read More »

253

click to vote

ILP
2007
Springer

283views Automated Reasoning» more ILP 2007»

Building Relational World Models for Reinforcement Learning

15 years 11 months ago

Download ftp.cs.wisc.edu

Abstract. Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capaci...

Trevor Walker, Lisa Torrey, Jude W. Shavlik, Richa...

claim paper

Read More »

196

click to vote

AAAI
2012

205views Intelligent Agents» more AAAI 2012»

Kernel-Based Reinforcement Learning on Representative States

13 years 8 months ago

Download www.bkveton.com

Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method for batchmod...

Branislav Kveton, Georgios Theocharous

claim paper

Read More »

151

click to vote

ATAL
2008
Springer

123views Intelligent Agents» more ATAL 2008»

Sigma point policy iteration

15 years 7 months ago

Download web.mit.edu

In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...

Michael H. Bowling, Alborz Geramifard, David Winga...

claim paper

Read More »

« Prev « First page 7 / 14 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers