We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Simi...
This paper investigates how to automatically create a dialogue control component of a listening agent to reduce the current high cost of manually creating such components. We coll...
Toyomi Meguro, Ryuichiro Higashinaka, Yasuhiro Min...
Many stochastic planning problems can be represented using Markov Decision Processes (MDPs). A difficulty with using these MDP representations is that the common algorithms for so...
A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much ...
Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (fmdps). However, these algorithms need ...
Thomas Degris, Olivier Sigaud, Pierre-Henri Wuille...