In this paper, we deal with the sequential decision making problem of agents operating in computational economies, where there is uncertainty regarding the trustworthiness of serv...
W. T. Luke Teacy, Georgios Chalkiadakis, Alex Roge...
Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...
Frank Sehnke, Alex Graves, Christian Osendorfer, J...
Multi-robot learning faces all of the challenges of robot learning with all of the challenges of multiagent learning. There has been a great deal of recent research on multiagent ...
We address a new learning problem where the goal is to build a predictive model that minimizes prediction time (the time taken to make a prediction) subject to a constraint on mod...
Biswanath Panda, Mirek Riedewald, Johannes Gehrke,...
In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-differe...
Alessandro Lazaric, Mohammad Ghavamzadeh, Ré...