Sciweavers

COGSR
2011

Psychological models of human and optimal performance in bandit problems

13 years 7 months ago
Psychological models of human and optimal performance in bandit problems
In bandit problems, a decision-maker must choose between a set of alternatives, each of which has a fixed but unknown rate of reward, to maximize their total number of rewards over a short sequence of trials. Performing well in these problems requires balancing the need to search for highly-rewarding alternatives with the need to capitalize on those alternatives already known to be reasonably good. Consistent with this motivation, we develop a new psychological model that relies on switching between latent exploration and exploitation states. We test the model over a range of two-alternative bandit problems, against both human and optimal decision-making data, comparing it to benchmark models from the reinforcement learning literature. By making inferences about the latent states from optimal decision-making behavior, we characterize how people should switch between exploration and exploitation. By making inferences from human data, we begin to characterize how people actually do swi...
Michael D. Lee, Shunan Zhang, Miles Munro, Mark St
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2011
Where COGSR
Authors Michael D. Lee, Shunan Zhang, Miles Munro, Mark Steyvers
Comments (0)