In this paper, we present a novel hardware architecture to achieve erosion and dilation with a large structuring element. We are proposing a modification of HGW algorithm with a block mirroring scheme to ease the propagation and memory access and to minimize memory consumption. It allows to suppress the needs for backward scanning and gives the possibility for hardware architecture to process very large lines with a low latency. It compares well with the Lemonnier's architecture in terms of ASIC gates area and shows the interest of our solution by dividing the circuit area by an average of 10.