Sciweavers

119 search results - page 8 / 24
» Average Reward Timed Games
Sort
View
COLT
2008
Springer
13 years 9 months ago
Adapting to a Changing Environment: the Brownian Restless Bandits
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
Aleksandrs Slivkins, Eli Upfal
JAIR
2008
119views more  JAIR 2008»
13 years 7 months ago
A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics
Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents' decisions. Due to the complexity of the problem, the majority of the previo...
Sherief Abdallah, Victor R. Lesser
WIOPT
2010
IEEE
13 years 5 months ago
Evolutionary forwarding games in Delay Tolerant Networks
—In this paper, we apply evolutionary games to non-cooperative forwarding control of Delay Tolerant Networks (DTN). We focus our study on the probability to deliver a message fro...
Rachid El Azouzi, Francesco De Pellegrini, Vijay K...
NIPS
2007
13 years 9 months ago
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...
Ambuj Tewari, Peter L. Bartlett
COLT
2006
Springer
13 years 11 months ago
Online Learning with Variable Stage Duration
We consider online learning in repeated decision problems, within the framework of a repeated game against an arbitrary opponent. For repeated matrix games, well known results esta...
Shie Mannor, Nahum Shimkin