Sciweavers

200 search results - page 22 / 40
» Point-Based Policy Iteration
Sort
View
CORR
2006
Springer
113views Education» more  CORR 2006»
13 years 7 months ago
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD()...
Manuel Loth, Philippe Preux
ISLPED
1999
ACM
91views Hardware» more  ISLPED 1999»
13 years 12 months ago
Stochastic modeling of a power-managed system: construction and optimization
-- The goal of a dynamic power management policy is to reduce the power consumption of an electronic system by putting system components into different states, each representing ce...
Qinru Qiu, Qing Wu, Massoud Pedram
ECAI
2006
Springer
13 years 11 months ago
Strategic Foresighted Learning in Competitive Multi-Agent Games
We describe a generalized Q-learning type algorithm for reinforcement learning in competitive multi-agent games. We make the observation that in a competitive setting with adaptive...
Pieter Jan't Hoen, Sander M. Bohte, Han La Poutr&e...
JMLR
2006
143views more  JMLR 2006»
13 years 7 months ago
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...
Rémi Munos
JSAC
2007
98views more  JSAC 2007»
13 years 7 months ago
Optimum Power Allocation for Single-User MIMO and Multi-User MIMO-MAC with Partial CSI
Abstract— We consider both the single-user and the multiuser power allocation problems in MIMO systems, where the receiver side has the perfect channel state information (CSI), a...
Alper Soysal, Sennur Ulukus