Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs. We have used this information to explore the relationship between kernel-based NUMA management policies and multiprocessor memory architecture. Our trace analysis techniques employ an off-line, optimal cost policy as a baseline against which to compare on-line policies, and as a policyinsensitive tool for evaluating architectural design alternatives. We compare the performance of our optimal policy with that of three implementable policies (two of which appear in previous work), on a variety of applications, with varying relative speeds for page moves and local, global, and remote memory references. Our results indicate that a good NUMA policy must be chosen to match its machine, and confirm that such policies can be both simple and effective. They also indicate that programs for NUMA machines must be written with care to obtain the best performance.
William J. Bolosky, Michael L. Scott, Robert P. Fi