Sciweavers

54 search results - page 8 / 11
» Convergence Results for Single-Step On-Policy Reinforcement-...
Sort
View
AAAI
2010
13 years 8 months ago
Multi-Agent Learning with Policy Prediction
Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. This paper first introduces a new gradient-based learning algorithm, augmenting th...
Chongjie Zhang, Victor R. Lesser
NIPS
1998
13 years 8 months ago
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning aft...
Michael J. Kearns, Satinder P. Singh
ICMLA
2010
13 years 5 months ago
Multimodal Parameter-exploring Policy Gradients
Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...
Frank Sehnke, Alex Graves, Christian Osendorfer, J...
ACMICEC
2007
ACM
102views ECommerce» more  ACMICEC 2007»
13 years 11 months ago
Learning to trade with insider information
This paper introduces algorithms for learning how to trade using insider (superior) information in Kyle's model of financial markets. Prior results in finance theory relied o...
Sanmay Das
CACM
2010
105views more  CACM 2010»
13 years 7 months ago
Censored exploration and the dark pool problem
We introduce and analyze a natural algorithm for multi-venue exploration from censored data, which is motivated by the Dark Pool Problem of modern quantitative finance. We prove t...
Kuzman Ganchev, Yuriy Nevmyvaka, Michael Kearns, J...