This paper presents a set of fast algorithm and VLSI architecture for HDTV-sized H.264 fractional motion estimation. To solve the long computational latency in HD-sized application, we propose to use the single iteration algorithm with only six search points. This single iteration method halves the cycle count of two iteration methods in previous approaches. Moreover, we propose to use 4x4 Hadamard instead of 8x8 Hadamard as cost function for H.264 high profiles without significant video quality loss. By these techniques, the resulted architecture can save 20% of area and provide over 40% of throughput improvement than the previous work, and is able to support HDTV applications.