H.264/AVC is a new international standard for the compression of natural video images, in which a deblocking filter has been adopted to remove blocking artifacts. In this paper, we propose an efficient processing order for the deblocking filter, and present the VLSI architecture according to the order. Making good use of data dependence between neighboring 4x4 blocks, our design reduces the requirement of on-chip SRAM bandwidth and increases the throughput of the filter processing. The architecture has been described in Verilog HDL, simulated with VCS and synthesized using 0.25?m CMOS cells library by Synopsys Design Compiler. The circuit costs about 24k logic gates (not including a 32x64 SRAM and two 32x96 SRAMs) when the working frequency is set to 100MHz. This design can support real-time deblocking of HDTV (1280x720, 60fps) H.264/AVC video. This architecture is valuable for the hardware design of H.264/AVC CODEC.