Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is ...
Niranjan Srinivas, Andreas Krause, Sham Kakade, Ma...
Abstract. In preference learning, the algorithm observes pairwise relative judgments (preference) between items as training data for learning an ordering of all items. This is an i...
Abstract. We present an implementation of model-based online reinforcement learning (RL) for continuous domains with deterministic transitions that is specifically designed to achi...
We address an approximation method for Gaussian process (GP) regression, where we approximate covariance by a block matrix such that diagonal blocks are calculated exactly while o...
In this paper, we introduce a new approach for nonlinear equalization based on Gaussian processes for classification (GPC). We propose to measure the performance of this equalizer ...