An effective way to implement image processing applications is to use embedded processors with dynamically reconfigurable accelerator cores. The processing speed of these processors are not only depends on the parallelism, but also depend on the local memory utilization since the local memories are much faster than the global memory. In this paper, we accelerate the optical-flow extraction algorithm based on SAD calculation using a dynamically reconfigurable ALU array. We use the maximum parallelism and propose a memory allocation method to use the local memory effectively. The experimental results demonstrate that an image of size 640