Modular inverse computation is needed in several public key cryptographic applications. In this work, we present two VLSI hardware implementations used in the calculation of Montgomery modular inverse operation. The implementations are based on the same inversion algorithm, however, one is fixed (fully parallel) and the other is scalable. The scalable design is the novel modification performed on the fixed hardware to make it occupy a small area and operate within better or similar speed. Both hardware designs are compared based on their speed and area. The area of the scalable design is on average 42% smaller than the fixed one. The delay of the designs, however, depends on the actual data size and the maximum numbers the hardware can handle. As the actual data size approach the hardware limit the scalable hardware speedup reduces in comparison to the fixed one, but still its delay is practical.
Adnan Abdul-Aziz Gutub, Alexandre F. Tenca, &Ccedi