In this paper, we study and analyze the computational complexity of the deblocking filter in H.264/AVC baseline decoder based on SimpleScalar/ARM simulator. The simulation result shows that the memory reference, content activity check operations, and filter operations are known to be very time consuming in the decoder of this new video coding standard. In order to improve overall system performance, we propose a configurable, extensible, and synthesizable window-based processing architecture which simultaneously processes the horizontal filtering of vertical edge and vertical filtering of horizontal edge. As a result, the memory performance of the proposed architecture is improved by four times when compared to previous designs. Moreover, the system performance of our window-based architecture significantly outperforms the previous designs from 7 times to 20 times.