Search Sciweavers | Sciweavers

119 search results - page 13 / 24

» Average Reward Timed Games

184

click to vote

COMSNETS
2012

163views more COMSNETS 2012»

Steptacular: An incentive mechanism for promoting wellness

14 years 3 months ago

Download www.stanford.edu

Abstract—This paper describes Steptacular, an online interactive incentive system for encouraging people to walk more. A trial offering Steptacular to the employees of Accenture-...

Naini Gomes, Deepak Merugu, Gearoid O'Brien, Chinm...

claim paper

Read More »

224

Voted

JMLR
2010

119views more JMLR 2010»

A Convergent Online Single Time Scale Actor Critic Algorithm

15 years 2 months ago

Download jmlr.csail.mit.edu

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their gen...

Dotan Di Castro, Ron Meir

claim paper

Read More »

223

click to vote

CORR
2010
Springer

143views Education» more CORR 2010»

The Non-Bayesian Restless Multi-Armed Bandit: a Case of Near-Logarithmic Regret

15 years 4 months ago

Download www.ece.ucdavis.edu

In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A play...

Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Z...

claim paper

Read More »

206

click to vote

ATAL
2007
Springer

122views Intelligent Agents» more ATAL 2007»

An incentive mechanism for message relaying in unstructured peer-to-peer systems

16 years 1 months ago

Download www.cs.cmu.edu

Distributed message relaying is an important function of a peer-topeer system to discover service providers. Existing search protocols in unstructured peer-to-peer systems either ...

Cuihong Li, Bin Yu, Katia P. Sycara

claim paper

Read More »

202

click to vote

JMLR
2010

103views more JMLR 2010»

Regret Bounds and Minimax Policies under Partial Monitoring

15 years 2 months ago

Download jmlr.csail.mit.edu

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: p...

Jean-Yves Audibert, Sébastien Bubeck

claim paper

Read More »

« Prev « First page 13 / 24 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers