—Parallel programming techniques have become one of the great challenges in the transition from single-core to multicore architectures. In this paper, we investigate the parallelization of the Montgomery multiplication, a very common and timeconsuming primitive in public-key cryptography. A scalable parallel programming scheme, called pSHS, is presented to map the Montgomery multiplication to a general multicore architecture. The pSHS scheme offers a considerable speedup. Based on 2-, 4-, and 8-core systems, the speedup of a parallelized 2048-bit