Recent studies have shown that programming in a Partition Global Address Space (PGAS) language can be more productive than programming in a message passing model. One reason for this is the ability to access remote memory implicitly through shared memory reads and writes. But this benefit does not come without a cost. It is very difficult to spot communication by looking at the program text, since remote reads and writes look exactly the same as local reads and writes. This makes manual communication performance debugging an arduous task. In this paper, we describe a tool called ti-trend-prof that can do automatic performance debugging using only program traces from small processor configurations and small input sizes in Titanium [12], a PGAS language. ti-trend-prof presents trends to the programmer to help spot possible communication performance bugs even for processor configurations and input sizes that have not been run. We used titrend-prof on two of the largest Titanium applicatio...
Jimmy Su, Katherine A. Yelick