Efficient hardware architectures for the Luffa hash algorithm are proposed in this work. We explore different tradeoffs and propose several architectures, targeting both compact and high-throughput designs. Implemented using UMC 0.13 µm CMOS standard cell library, the most compact architecture of Luffa-224/256 contains 18,260 GE. The same version, optimized for speed, achieves a throughput of almost 32 Gbps, while the throughput of the pipelined design