Given the increasing traffic on the World Wide Web (Web), it is difficult for a single popular Web server to handle the demand from its many clients. By clustering a group of Web servers, it is possible to reduce the origin Web server's load significantly and reduce user's response time when accessing a Web document. A fundamental question is how to allocate Web documents among these servers in order to achieve load balancing? In this paper, we are given a collection of documents to be stored on a cluster of Web servers. Each of the servers is associated with resource limits in its memory and its number of HTTP connections. Each document has an associated size and access cost. The problem is to allocate the documents among the servers so that no server's memory size is exceeded, and the load is balanced as equally as possible. In this paper, we show that most simple formulations of this problem are NP-hard, we establish lower bounds on the value of the optimal load, and...