We demonstrate the use of highly parallel graphics processing units (GPUs) to accelerate the Superposition/Convolution (S/C) algorithm to interactive rates while reducing the number of approximations. S/C first transports the incident fluence to compute the total energy released per unit mass (TERMA) grid. Dose is then calculated by superimposing the dose deposition kernel at each point in the TERMA grid and summing the contributions to the surrounding voxels. The TERMA algorithm was enhanced with physically correct multi-spectral attenuation and a novel inverse formulation for increased performance, accuracy and simplicity. Dose deposition utilized a tilted poly-energetic inverse cumulative-cumulative kernel, with the novel option of using volumetric mip-maps to approximate solid angle ray-casting. Exact radiological path ray-casting decreased discretization errors. We achieved a speed-up of 34x-98x over a highly optimized CPU implementation.