In this paper we focus on the design of high performance peer-to-peer content sharing systems. In particular, our goal is to achieve global load balancing and short user-request response times. This is a formidable challenge, given the requirement to respect the autonomy of peers, their heterogeneity in terms of processing and storage capacities, their different content contributions, the huge system scale, and the dynamic system environment. Our approach exploits the semantic categorization of published documents and constructs clusters of peers. We provide a formal formulation for the problem of load balancing in our setting and prove that it is NP-complete. We also present a greedy polynomial time algorithm that achieves nearly optimal load balancing as shown by our experimental results.