In this paper we present an hardware implementation of the RSA algorithm for public-key cryptography. The RSA algorithm consists in the computation of modular exponentials on large integers, that can be reduced to repeated modular multiplications. We present a serial implementation of RSA, which is based upon an optimized version of the RSA algorithm originally proposed by P.L. Montgomery. The proposed architecture is innovative, and it widely exploits specific capabilities of Xilinx programmable devices. As compared to other solutions in the literature, the proposed implementation of the RSA processor has smaller area occupation and comparable performance. The final performance level is a function of the serialization factor. We provide a thorough discussion of design tradeoffs, in terms of area requirements vs performance, for different values of the key length and of the serialization factor.