We start by formulating the resource sharing in peer-to-peer (P2P) networks as a random-matching gift-giving game, where self-interested peers aim at maximizing their own long-term utilities. In order to provide incentives for the peers to voluntarily share their resources, we propose to design protocols that operate according to pre-determined social norms. To optimize their long-term performance when playing such a game, peers can learn to play the best response by solving individual stochastic control problems. We first show that when a peer learns in an environment in which its opponents play a fixed strategy, learning will provide an advantage for this peer (i.e. it will lead to an increased utility for the learning peer). If all the peers in the network learn, we prove that learning remains beneficial for the peers. Moreover, we prove that the network will converge to the “fully-cooperative state” (where a socially optimal outcome is attained) if the update error of the peer...