Interprocessor communication times can be a significant fraction of the overall execution time required for data parallel applications. Large communication to computation ratios of the tasks performed by these applications results in suboptimal performance when executed on data parallel architectures. We present an alternate architectural framework, referred to as concurrently communicating SIMD (CCSIMD), which maintains the SIMD execution model, while introducing a small degree of task parallelism to exploit the communication concurrency. We introduce three different implementations of our architectural framework, and illustrate their effect on a suite of data parallel applications. Results show that CCSIMD architectures can provide a costeffective way to hide communication latency in data parallel applications that can result in an increase in the performance of these applications.
Vivek Garg, David E. Schimmel