The paper presents a hardware friendly fast algorithm and its architecture for motion estimation (ME) in H.264 video coding. The fast algorithm adopts the quarter pel subsampling and mode filtering that reduces the computing complexity of integer ME by 75%, and only two modes instead of various modes are refined for fractional ME. This also can save about 80% fractional ME cycle counts in average. The simulation result shows that it only increases the bit rate within 2% and at most 0.14dB quality degradation. Finally, the resulted parallel architecture only costs 58% of area cost and requires 48% of cycle counts when compared with the previous designs.