Search Sciweavers | Sciweavers

97 search results - page 5 / 20

» An epsilon-Optimal Grid-Based Algorithm for Partially Observ...

click to vote

COLT
2000
Springer

87views Machine Learning» more COLT 2000»

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

13 years 11 months ago

Download www.cs.iastate.edu

We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process ( ¢¡¤£¦¥§ ), and focus on gradient ascent approache...

Peter L. Bartlett, Jonathan Baxter

claim paper

Read More »

click to vote

ATAL
2010
Springer

136views Intelligent Agents» more ATAL 2010»

Quasi deterministic POMDPs and DecPOMDPs

13 years 8 months ago

Download www.damas.ift.ulaval.ca

In this paper, we study a particular subclass of partially observable models, called quasi-deterministic partially observable Markov decision processes (QDET-POMDPs), characterize...

Camille Besse, Brahim Chaib-draa

claim paper

Read More »

click to vote

IJCAI
2003

173views Artificial Intelligence» more IJCAI 2003»

A Planning Algorithm for Predictive State Representations

13 years 8 months ago

Download dli.iiit.ac.in

We address the problem of optimally controlling stochastic environments that are partially observable. The standard method for tackling such problems is to define and solve a Part...

Masoumeh T. Izadi, Doina Precup

claim paper

Read More »

click to vote

PERCOM
2007
ACM

189views Computer Networks» more PERCOM 2007»

Sensor Scheduling for Optimal Observability Using Estimation Entropy

14 years 7 months ago

Download people.eng.unimelb.edu.au

We consider sensor scheduling as the optimal observability problem for partially observable Markov decision processes (POMDP). This model fits to the cases where a Markov process ...

Mohammad Rezaeian

claim paper

Read More »

click to vote

FOCS
2007
IEEE

157views Theoretical Computer Science» more FOCS 2007»

Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards

14 years 1 months ago

Download www.cis.upenn.edu

We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...

Sudipto Guha, Kamesh Munagala

claim paper

Read More »

« Prev « First page 5 / 20 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers