While exploring to nd better solutions, an agent performing online reinforcement learning (RL) can perform worse than is acceptable. In some cases, exploration might have unsafe, ...
Satinder P. Singh, Andrew G. Barto, Roderic A. Gru...
The selection of the action to do next is one of the central problems faced by autonomous agents. In AI, three approaches have been used to address this problem: the programming-b...
Given an adequate simulation model of the task environment and payoff function that measures the quality of partially successful plans, competition-based heuristics such as geneti...
A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of ...
Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent's knowledge and actions that ...