Search Sciweavers | Sciweavers

33

ML
2002
ACM

154views Machine Learning» more ML 2002»

Technical Update: Least-Squares Temporal Difference Learning

13 years 9 months ago

TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...

Justin A. Boyan

claim paper

Read More »

39

click to vote

AGENTS
2001
Springer

201views Security Privacy» more AGENTS 2001»

Using background knowledge to speed reinforcement learning in physical agents

14 years 2 months ago

Download www.isle.org

This paper describes Icarus, an agent architecture that embeds a hierarchical reinforcement learning algorithm within a language for specifying agent behavior. An Icarus program e...

Daniel G. Shapiro, Pat Langley, Ross D. Shachter

claim paper

Read More »

33

click to vote

ESANN
2004

90views Neural Networks» more ESANN 2004»

High-accuracy value-function approximation with neural networks applied to the acrobot

13 years 11 months ago

Download remi.coulom.free.fr

Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this pape...

Rémi Coulom

claim paper

Read More »

36

click to vote

CHI
2009
ACM

180views Human Computer Interaction» more CHI 2009»

Comparing the use of tangible and graphical programming languages for informal science education

14 years 10 months ago

Download www.cs.tufts.edu

Much of the work done in the field of tangible interaction has focused on creating tools for learning; however, in many cases, little evidence has been provided that tangible inte...

Michael S. Horn, Erin Treacy Solovey, R. Jordan Cr...

claim paper

Read More »

34

click to vote

JCP
2007

143views more JCP 2007»

Noisy K Best-Paths for Approximate Dynamic Programming with Application to Portfolio Optimization

13 years 9 months ago

Download www.academypublisher.com

Abstract— We describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-bestpaths algorithm. We consider an a...

Nicolas Chapados, Yoshua Bengio

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers