This paper describes a parallel implementation developed to improve the time performance of the Iterative Closest Point Algorithm. Within each iteration, the correspondence calculations are distributed among the processor resources. At the end of each iteration, the results of the correspondence determination are communicated back to a central processor and the current transformation is calculated. A number of additional techniques were developed that served to improve upon this basic scheme. Calculating the partial sums within each distributed resource made it unnecessary to transmit the correspondence values back to the central processor, which reduced the communication overhead, and improved time performance. Randomly distributing the points among the processor resources resulted in a better load balancing, which further improved time performance. We also found that thinning the image by randomly removing a certain percentage of the points did not improve the performance, when view...
Christian Langis, Michael A. Greenspan, Guy Godin