As shown in [7], optimal control problems with either ODE or PDE dynamics can be solved efficiently using a setting of consistent approximations obtained by numerical discretizati...
We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...
We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
This paper presents the result for Simultaneous Perturbation Stochastic Approximation (SPSA) on the BBOB 2010 noiseless testbed. SPSA is a stochastic gradient approximation strate...
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that ...