Abstract. Image processing is widely used in many applications, including medical imaging, industrial manufacturing and security systems. In these applications, the size of the image is often very large, the processing time should be very small and the real-time constraints should be met. Therefore, during the last decades, there has been an increasing demand to exploit parallelism in applications. It is possible to explore parallelism along three axes: data-level parallelism (DLP), instructionlevel parallelism (ILP) and task-level parallelism (TLP). This paper explores the limitations and bottlenecks of increasing support for parallelism along the DLP and ILP axes in isolation and in combination. To scrutinize the effect of DLP and ILP in our architecture (template), an area model based on the number of ALUs (ILP) and the number of processing elements (DLP) in the template is defined, as well as a performance model. Based on these models and the template, a set of kernels of image p...