Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. The difficulty is not reformulating the algorithm and writing the code so that the program can run in parallel. The bigger challenge is achieving good GPU utilization, which requires a careful implementation armed with in-depth knowledge of the performance characteristics of the underlying architecture. This paper shows how to optimize the computational parallelism in robust facet image modeling to GPU architecture, using finegrained block level parallelism achieved by assigning more GPU cores/threads to process one pixel, rather than pixel level parallelism. The mapping strategy dependence on the computational profile is characterized.
Seung In Park, Yong Cao, Layne T. Watson