We introduce a set of techniques to both measure and optimize memory access locality of Java applications running on cc-NUMA servers. These techniques work at the object level and use information gathered from embedded hardware performance monitors. We propose a new NUMAaware Java heap layout. In addition, we propose using dynamic object migration during garbage collection to move objects local to the processors accessing them most. Our optimization technique reduced the number of non-local memory accesses in Java workloads generated from actual runs of the SPECjbb2000 benchmark by up to 41%, and also resulted in 40% reduction in workload execution time.
Mustafa M. Tikir, Jeffrey K. Hollingsworth