Sciweavers

109 search results - page 16 / 22
» Policy teaching through reward function learning
Sort
View
CORR
2008
Springer
64views Education» more  CORR 2008»
13 years 7 months ago
Linearly Parameterized Bandits
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vect...
Paat Rusmevichientong, John N. Tsitsiklis
AIPS
2007
13 years 10 months ago
Learning to Plan Using Harmonic Analysis of Diffusion Models
This paper summarizes research on a new emerging framework for learning to plan using the Markov decision process model (MDP). In this paradigm, two approaches to learning to plan...
Sridhar Mahadevan, Sarah Osentoski, Jeffrey Johns,...
NIPS
2007
13 years 9 months ago
Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
Learning in real-world domains often requires to deal with continuous state and action spaces. Although many solutions have been proposed to apply Reinforcement Learning algorithm...
Alessandro Lazaric, Marcello Restelli, Andrea Bona...
IAT
2005
IEEE
14 years 1 months ago
Self-Organizing Cognitive Agents and Reinforcement Learning in Multi-Agent Environment
This paper presents a self-organizing cognitive architecture, known as TD-FALCON, that learns to function through its interaction with the environment. TD-FALCON learns the value ...
Ah-Hwee Tan, Dan Xiao
NIPS
2007
13 years 9 months ago
Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning
Electrical power management in large-scale IT systems such as commercial datacenters is an application area of rapidly growing interest from both an economic and ecological perspe...
Gerald Tesauro, Rajarshi Das, Hoi Chan, Jeffrey O....