We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
The Markov chain approximation method is an effective and widely used approach for computing optimal values and controls for stochastic systems. It was extended to nonlinear (and p...
The general stochastic optimal control (SOC) problem in robotics scenarios is often too complex to be solved exactly and in near real time. A classical approximate solution is to ...
Model learning combined with dynamic programming has been shown to be e ective for learning control of continuous state dynamic systems. The simplest method assumes the learned mod...
Real-world networks often need to be designed under uncertainty, with only partial information and predictions of demand available at the outset of the design process. The field ...