Sciweavers

ICASSP
2011
IEEE
13 years 3 months ago
Logarithmic weak regret of non-Bayesian restless multi-armed bandit
Abstract—We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. At each time, a player chooses K out of N (N > K) arms to play. The state of each ar...
Haoyang Liu, Keqin Liu, Qing Zhao
CORR
2011
Springer
161views Education» more  CORR 2011»
13 years 3 months ago
Doubly Robust Policy Evaluation and Learning
We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as...
Miroslav Dudík, John Langford, Lihong Li
TSP
2010
13 years 6 months ago
Distributed learning in multi-armed bandit with multiple players
We formulate and study a decentralized multi-armed bandit (MAB) problem. There are distributed players competing for independent arms. Each arm, when played, offers i.i.d. reward a...
Keqin Liu, Qing Zhao
TISSEC
2010
142views more  TISSEC 2010»
13 years 6 months ago
A logical specification and analysis for SELinux MLS policy
The SELinux mandatory access control (MAC) policy has recently added a multi-level security (MLS) model which is able to express a fine granularity of control over a subject'...
Boniface Hicks, Sandra Rueda, Luke St. Clair, Tren...
JMLR
2010
189views more  JMLR 2010»
13 years 6 months ago
Adaptive Step-size Policy Gradients with Average Reward Metric
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
JMLR
2010
101views more  JMLR 2010»
13 years 6 months ago
Efficient Reductions for Imitation Learning
Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the trainin...
Stéphane Ross, Drew Bagnell
EIS
2011
253views ECommerce» more  EIS 2011»
13 years 6 months ago
A modelling and reasoning framework for social networks policies
Policy languages (such as privacy and rights) have had little impact on the wider community. Now that Social Networks have taken off, the need to revisit Policy languages and real...
Guido Governatori, Renato Iannella
CORR
2010
Springer
143views Education» more  CORR 2010»
13 years 8 months ago
The Non-Bayesian Restless Multi-Armed Bandit: a Case of Near-Logarithmic Regret
In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A play...
Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Z...
CJ
2010
134views more  CJ 2010»
13 years 9 months ago
Designing Effective Policies for Minimal Agents
A policy for a minimal reactive agent is a set of condition-action rules used to determine its response to perceived environmental stimuli. When the policy pre-disposes the agent t...
Krysia Broda, Christopher J. Hogger
ICTAC
2009
Springer
13 years 9 months ago
A First-Order Policy Language for History-Based Transaction Monitoring
Online trading invariably involves dealings between strangers, so it is important for one party to be able to judge objectively the trustworthiness of the other. In such a setting,...
Andreas Bauer 0002, Rajeev Goré, Alwen Tiu