This paper presents an RSA hardware design that simultaneously achieves high-performance and lowpower. A bit-oriented, split modular multiplication algorithm and architecture are proposed to fully exert the radix-4 computational capability. Further, we identify the switching profile of RSA data and accordingly propose power-optimized designs for the storage elements and key computational components. The complete RSA modular exponentiation hardware has been implemented using cell-based 0.18um CMOS technology. Post-layout simulation shows that the design delivers an average performance of 586kbps at