A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments