—Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input– output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be deterministic. In many problems, however, the consequence of an action may be uncertain or stochastic in nature. In this paper, we propose a novel RL approach to combine the universal-function-approximation capability of fuzzy systems with consideration of probability distributions over possible consequences of an action. The proposed generalized probabilistic fuzzy RL (GPFRL) method is a modified version of the actor–critic (AC) learning architecture. The learning
William M. Hinojosa, Samia Nefti, Uzay Kaymak