In this paper, we analyze the theoretical delay bound of the SHA-1 algorithm and propose architectures to achieve high throughput hardware implementations which approach this bound. According to the results of FPGA implementations, 3,541 Mbps with a pipeline and 893 Mbps without a pipeline were achieved. Moreover, synthesis results using 0.18μm CMOS technology showed that 10.4 Gbps with a pipeline and 3.1 Gbps without a pipeline can be achieved. These results are much faster than previously published results. The high throughputs are due to the unfolding transformation, which reduces the number of required cycles for one block hash. We reduced the required number of cycles to 12 cycles for a 512 bit block and showed that 12 cycles is the optimal in our design.