Load balancing involves assigning to each processor, work proportional to its performance, minimizing the execution time of the program. Althoughstatic load balancing can solve many problems (e.g., those caused by processor heterogeneity and non-uniform loops) for most regular applications, the transient external load due to multiple-users on a network of workstations necessitates a dynamic approach to load balancing. In this paper we examine the behavior of global vs local, and centralized vs distributed, load balancing strategies. We show that different schemes are best for different applications under varying program and system parameters. Therefore, customized load balancing schemes become essential for good performance. We present a hybrid compile-time and run-time modeling and decisionprocesswhichselects(customizes)thebestscheme, along with automatic generation of parallel code with calls to a runtime library for load balancing.