Abstract—We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. At each time, a player chooses K out of N (N > K) arms to play. The state of each ar...
We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as...
We formulate and study a decentralized multi-armed bandit (MAB) problem. There are distributed players competing for independent arms. Each arm, when played, offers i.i.d. reward a...
The SELinux mandatory access control (MAC) policy has recently added a multi-level security (MLS) model which is able to express a fine granularity of control over a subject'...
Boniface Hicks, Sandra Rueda, Luke St. Clair, Tren...
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the trainin...
Policy languages (such as privacy and rights) have had little impact on the wider community. Now that Social Networks have taken off, the need to revisit Policy languages and real...
In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A play...
Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Z...
A policy for a minimal reactive agent is a set of condition-action rules used to determine its response to perceived environmental stimuli. When the policy pre-disposes the agent t...
Online trading invariably involves dealings between strangers, so it is important for one party to be able to judge objectively the trustworthiness of the other. In such a setting,...