Many scientific and engineering applications, which are increasingly being ported from software to reconfigurable platforms, require Gaussian-distributed random numbers. Thus, the efficient generation of these random numbers using few resources and allowing for high clocking rates is an important design factor in the application performance. In this paper, we demonstrate scalable implementations of the Ziggurat algorithm, a Gaussian random number generator, which we have modified for optimal performance on the Xilinx Virtex-4 FX12 FPGA. The resource-efficient design uses a small number of slices (233) while delivering a high throughput of 240 million samples per second. A two-way parallelizable design is discussed and the estimated throughput scales almost linearly. The generation of multiple Gaussian random numbers per cycle allows for the implementation of multiple, concurrent simulations on FPGAs with minimal resource overhead.