Interest in Graphical Processing Units (GPUs) is skyrocketing due to their potential to yield spectacular performance on many important computing applications. Unfortunately, writing such efficient GPU kernels requires painstaking manual optimization effort which is very error prone. We contribute the first comprehensive symbolic verifier for kernels written in CUDA C. Called the `Prover of User GPU programs (PUG),' our tool efficiently and automatically analyzes real-world kernels using Satisfiability Modulo Theories (SMT) tools, detecting bugs such as data races, incorrectly synchronized barriers, bank conflicts, and wrong results. PUG's innovative ideas include a novel approach to symbolically encode thread interleavings, exact analysis for correct barrier placement, special methods for avoiding interleaving generation, dividing up the analysis over barrier intervals, and handling loops through three approaches: loop normalization, overapproximation, and invariant finding...