This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009...
In this paper we apply the recent notion of anytime universal intelligence tests to the evaluation of a popular reinforcement learning algorithm, Q-learning. We show that a general...
—In order to spot the digits in a handwritten document, each component is sent to a classifier. This is a time consuming process because a document usually contains several hundr...
Nicola Nobile, Chun Lei He, Malik Waqas Sagheer, L...
—Computer systems often reach a point at which the relative cost to increase some tunable parameter is no longer worth the corresponding performance benefit. These “knees” t...
Ville Satopaa, Jeannie R. Albrecht, David Irwin, B...
The problem of reconstructing a region from a set of sample points is common in many geometric applications, including computer vision. It is very helpful to be able to guarantee ...