We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive ...
We consider sensor scheduling as the optimal observability problem for partially observable Markov decision processes (POMDP). This model fits to the cases where a Markov process ...
In stochastic thresholding, the threshold for quantization of a signal is randomized. An estimator based on quantized signal data can be optimized through stochastic thresholding. ...
We propose a highly efficient framework for penalized likelihood kernel methods applied to multiclass models with a large, structured set of classes. As opposed to many previous a...
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...