In this paper we formulate the problem of grouping the states of a discrete Markov chain of arbitrary order simultaneously with deconvolving its transition probabilities. As the na...
Agents often have to construct plans that obey resource limits for continuous resources whose consumption can only be characterized by probability distributions. While Markov Deci...
This paper extends the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state spac...
Policy search is a method for approximately solving an optimal control problem by performing a parametric optimization search in a given class of parameterized policies. In order ...
Abstract— Continuous action sets are used in many reinforcement learning (RL) applications in robot control since the control input is continuous. However, discrete action sets a...
Akihiko Yamaguchi, Jun Takamatsu, Tsukasa Ogasawar...