— We investigate the problem of a robot maximizing its long-term average rate of return on work. We present a means to obtain an estimate of the instantaneous rate of return when work is rewarded in discrete atoms, and a method that uses this to recursively maximize the long-term average return when work is available in localized patches, each with locally diminishing returns. We examine a puck-foraging scenario, and test our method in simulation under a variety of conditions. However, the analysis and approach applies to the general case.
Jens Wawerla, Richard T. Vaughan