Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
— Motivated by some crowd motion models in the presence of noise, we consider an optimal control problem governed by the Fokker-Planck equation. We sketch optimality conditions b...
—Dual descent methods are commonly used to solve network optimization problems because their implementation can be distributed through the network. However, their convergence rat...
Michael Zargham, A. Ribeiro, Ali Jadbabaie, Asuman...
Reinforcement learning (RL) can be impractical for many high dimensional problems because of the computational cost of doing stochastic search in large state spaces. We propose a ...
Abstract. The purpose of this paper is (1) to provide a theoretical justification for the use of Monte-Carlo sampling for approximate resolution of NP-hard maximization problems in...