In this paper, we consider the problem of planning and learning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative direct policysearc...
Abstract. We propose a general methodology based on robust optimization to address the problem of optimally controlling a supply chain subject to stochastic demand in discrete time...
Abstract. We consider a new discriminative learning approach to sequence labeling based on the statistical concept of the Z-score. Given a training set of pairs of hidden-observed ...