Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performance challenges must be identified and solved to attain high parallel efficiency in such cases. We present case studies involving NAMD, a parallel classic molecular dynamics application for large biomolecular systems, and CPAIMD, Car-Parrinello ab initio molecular dynamics application, and efforts to scale them to large number of processors. Both applications are implemented in Charm++, and the performance analysis was carried out using Projections, the performance visualization/analysis tool associated with Charm++. We will showcase a series of optimizations facilitated by Projections. The resultant performance of NAMD led to a Gordon Bell award at SC2002 with unprecedented speedup on 3,000 processors with teraflops level peak performance. We also explore the techniques for applying the performance visualizatio...
Laxmikant V. Kalé, Gengbin Zheng, Chee Wai