We have taken a NIST molecular dynamics simulation program (md3), which was configured as a single sequential process running on a CRAY C90 vector supercomputer, and parallelized it to run in a distributed memory message passing environment. Since portability was a major concern during parallelization, we used the Message Passing Interface (MPI) standard. The features of MPI provide a basic set of interprocess communication primitives on many architectures. The parallel md3 program has two basic algorithms resulting in a MPMD (Multiple Program Multiple Data) structure, versus the more common SPMD (Single Program Multiple Data) structure, and has the potential to exploit heterogeneous processing. For any given number of nodes we have devised an equation to determine the initial node allocation among these multiple programs which yields near optimal load balance. We also dynamically manage the load balance between processes to correct for run time variations and to achieve better perfor...