Unlike familiar macroblock-based in-loop deblocking filter in H.264, the filters of VC-1 perform all horizontal edges (for in-loop deblocking filtering) or vertical edges (for overlap smoothing) first and then the other directional filtering edges. The entire procedure is very time-consuming and with high memory access loading for the whole system. This paper presents a novel method and the efficient integrated architecture design, which involves an 12?12 overlapped block that combines overlap smoothing with loop filtering for performance and cost by sharing circuits and resources. This architecture has capability to process HDTV1080p 30fps video and HDTV 2048?1536 24fps video at 180MHz. The same concept is applicable to other video processing algorithms, especially in deblocking filter for video post-processing in a frame-based order.