The performance of RSA hardware is primarily determined by an efficient implementation of the long integer modular arithmetic and the ability to utilize the Chinese Remainder Theorem (CRT) for the private key operations. This paper presents the multiplier architecture of the RSA crypto chip, a high-speed hardware accelerator for long integer modular arithmetic. The RSA multiplier datapath is reconfigurable to execute either one 1024 bit modular exponentiation or two 512 bit modular exponentiations in parallel. Another significant characteristic of the multiplier core is its high degree of parallelism. The actual RSA prototype contains a 1056£16 bit word-serial multiplier which is optimized for modular multiplications according to Barret’s modular reduction method. The multiplier core is dimensioned for a clock frequency of 200 MHz and requires 227 clock cycles for a single 1024 bit modular multiplication. Pipelining in the highly parallel long integer unit allows to achiev...