Commodity accelerator technologies including reconfigurable devices provide an order of magnitude performance improvement compared to mainstream microprocessor systems. A number of computeintensive scientific applications, therefore, can potentially benefit from commodity computing devices available in the form of co-processor accelerators. However, there has been little progress in accelerating production-level scientific applications using these technologies due to several programming and performance challenges. One of the key performance challenges is performance sustainability. While computation is often accelerated substantially by accelerator devices, the achievable performance is significantly lower once the data transfer costs and overheads are incorporated. We present an application-specific memory characterization technique for an FPGA-accelerated system that enabled us to reduce data transfer overhead by a factor of five for a production-scale scientific application. Our pr...
Sadaf R. Alam, Jeffrey S. Vetter, Melissa C. Smith