In this paper two compact hardware structures for the computation of the CLEFIA encryption algorithm are presented. One structure based on the existing state of the art and a novel structure with a more compact organization. This paper shows that, with the use of the existing embedded FPGA components and a careful scheduling, throughputs above 1Gbit/s can be achieved with a resource usage as low as 86 LUTs and 3 BRAMs on a VIRTEX 5 FPGA. Implementation results suggest that a LUT reduction up to 67% can be achieved at a performance cost of 17% on a VIRTEX 4 FPGA, resulting in Throughput/Slice efficiency gains up to 2.5 times, when compared with the related state of the art.