Recent developments in processing devices such as graphical processing units and multi-core systems offer opportunities to make use of parallel techniques at the chip level to obtain high performance. We discuss the difficulties in establishing suitable benchmark codes for making comparisons across these device architectures and in a way that is representative of key applications. We report on our use of classical dynamical particle collision simulation codes as benchmarks for comparing modern GPUs. We discuss our findings in terms of architectural features for parallelism as well as clock speed issues.
Daniel P. Playne, Mitchell Johnson, Kenneth A. Haw