On recent PC graphics cards, fully programmable parallel geometry and pixel units are available providing powerful instruction sets to perform arithmetic and logical operations. In addition to computational functionality, pixel (fragment) units also provide an efficient memory interface to local graphics data. To take full advantage of this technology, considerable effort has been spent on the development of algorithms amenable to the intrinsic parallelism and efficient communication on such cards. In many examples, programmable graphics processing units (GPUs) have been explored to speed up algorithms previously run on the CPU. In this paper, we will demonstrate the benefits of commodity graphics hardware for the parallel implementation of general techniques of numerical computing.