In this paper, we consider social peer-to-peer (P2P) networks, where peers are sharing their resources (i.e., multimedia content and upload bandwidth). In the considered P2P networks, peers are self-interested, thereby determining their resource divisions (i.e., actions) among their associated peers such that their utility (e.g., multimedia quality) is maximized. Peers determine their optimal strategies for selecting their action based on a Markov Decision Process (MDP) framework, which enables the peers to maximize their cumulative utilities. We consider heterogeneous peers that have different and limited ability to characterize their resource reciprocations using only a limited number of states. We investigate how the limited number of states impacts the resource reciprocation and the resulting multimedia quality over time. Simulation results show that peers simultaneously refining their state descriptions can improve the multimedia quality in the resource reciprocation. Moreover, p...