We introduce the problem of load-distance balancing in assigning users of a delay-sensitive networked application to servers. We model the service delay experienced by a user as a sum of a network-incurred delay, which depends on its network distance from the server, and a server-incurred delay, stemming from the load on the server. The problem is to minimize the maximum service delay among all users. We address the challenge of finding a near-optimal assignment in a scalable distributed manner. The key to achieving scalability is using local solutions, whereby each server only communicates with a few close servers. Note, however, that the attainable locality of a solution depends on the workload – when some area in the network is congested, obtaining a near-optimal cost may require offloading users to remote servers, whereas when the network load is uniform, a purely local assignment may suffice. We present algorithms that exploit the opportunity to provide a local solution when ...