Abstract--In this paper, d-AdaptOR, a distributed opportunistic routing scheme for multi-hop wireless ad-hoc networks is proposed. The proposed scheme utilizes a reinforcement learning framework to achieve the optimal performance adaptively even in the absence of reliable knowledge about channel statistics and network model. The scheme extends an earlier proposed scheme [1] which relied on centralized computation. In contrast, d-AdaptOR operates solely based on local information and coordination with other neighboring nodes via network message passing while achieving optimality with respect to an expected average per packet cost criterion.