Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...
Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their gen...
In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the ...
The Chow parameters of a Boolean function f : {−1, 1}n → {−1, 1} are its n + 1 degree-0 and degree-1 Fourier coefficients. It has been known since 1961 [Cho61, Tan61] that ...
Anindya De, Ilias Diakonikolas, Vitaly Feldman, Ro...