RC4, the widely used stream cipher, is well known for its simplicity and ease of implementation in software. In case of a special purpose hardware designed for RC4, the best known implementation till date is 1 byte per 3 clock cycles. In this paper, we take a fresh look at the hardware implementation of RC4 and propose a novel architecture which generates 1 keystream byte per clock cycle. Our strategy considers generation of two consecutive keystream bytes by unwrapping the RC4 cycles. The same architecture is customized to perform the key scheduling algorithm at a rate of 1 round per clock.