Sciweavers

AAAI
2007

Authorial Idioms for Target Distributions in TTD-MDPs

14 years 2 months ago
Authorial Idioms for Target Distributions in TTD-MDPs
In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there is a clear choice of reward functions and in these cases significant care must be taken to construct a reward function that induces the desired behavior. In this paper, we consider an analogous design problem: crafting a target distribution in Targeted Trajectory Distribution MDPs (TTD-MDPs). TTD-MDPs produce probabilistic policies that minimize divergence from a target distribution of trajectories from an underlying MDP. They are an extension of MDPs that provide variety of experience during repeated execution. Here, we present a brief overview of TTD-MDPs with approaches for constructing target distributions. Then we present a novel authorial idiom for creating target distributions using prototype trajectories. We evaluate these approaches on a drama manager for an interactive game.
David L. Roberts, Sooraj Bhat, Kenneth St. Clair,
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2007
Where AAAI
Authors David L. Roberts, Sooraj Bhat, Kenneth St. Clair, Charles Lee Isbell Jr.
Comments (0)