The aim of the Cyber Rodent project [1] is to elucidate the origin of our reward and affective systems by building artificial agents that share the natural biological constraints: self-preservation (foraging) and self-reproduction (mating). This paper shows a method to evolve an agent’s exploratory reward by combining a framework of embodied evolution and the algorithm of constrained policy gradient reinforcement learning. Biological constraints are modeled by the average criteria, and the exploratory reward is computed from its own sensor information. The agent in which a part of constraints are satisfied is allowed to mate with another agent. If a mating behavior is successfully made between two agents, one of genetic operations is applied according to fitness values to improve the exploratory rewards. Through learning and embodied evolution, a group of agents obtain appropriate exploratory rewards.