—With systems such as Road Runner, there is a trend in super computing to offload parallel tasks to special purpose co-processors, composed of many relatively simple scalar processors. The cheaper commodity class equivalent of such a processor would be the graphics card, potentially offering super computer power within the confines of a desktop PC. Graphics cards however are not without problems, these range from the lack of double precision on most cards to a fairly steep drop in performance for using double precision on others, the end result being that in order to utilize the graphics card the computation must be done using single precision. In this paper we propose a method whereby a whole digit of the accuracy lost in single precision matrix multiply can be regained with only a 7% loss in performance by applying a compensated summation algorithm in a manner previously unexplored, a manner in which, at first glance, shouldn’t provide any benefit but empirical evidence will ...
Matthew Badin, Lubomir Bic, Michael B. Dillencourt