There is increasing research interest in solving routing problems in sensor networks subject to constraints such as data correlation, link reliability and energy conservation. Since information concerning these constraints are unknown in an environment, a reinforcement learning approach is proposed to solve this problem. To this end, we deploy a Bayesian method to offer good balance between exploitation and exploration. It estimates the benefit of exploration by value of information therefore avoids the error-prone process of parameter tuning which usually requires human intervention. Experimental results have shown that this approach outperforms the widely-used Qrouting method.