In this paper, we present an efficient hardware architecture for real-time implementation of intra prediction algorithm used in H.264 / MPEG4 Part 10 video coding standard. The hardware design is based on a novel organization of the intra prediction equations. This hardware architecture is designed to be used as part of a H.264 video decoder for portable applications. The proposed architecture is implemented in Verilog HDL. The Verilog RTL is verified to work at 70 MHz in a Xilinx II FPGA. The FPGA implementation can process a VGA frame (640x480) in the worst case in 9.85 msec.