This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD()...
-- The goal of a dynamic power management policy is to reduce the power consumption of an electronic system by putting system components into different states, each representing ce...
We describe a generalized Q-learning type algorithm for reinforcement learning in competitive multi-agent games. We make the observation that in a competitive setting with adaptive...
Pieter Jan't Hoen, Sander M. Bohte, Han La Poutr&e...
We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...
Abstract— We consider both the single-user and the multiuser power allocation problems in MIMO systems, where the receiver side has the perfect channel state information (CSI), a...