Abstract— In this paper, we propose a sub-operation parallelism optimization algorithm in SIMD processor synthesis. Given an initial assembly code and timing constraints, our algorithm synthesizes a processor core with sub-operation parallelism optimization for SIMD functional units. First we consider an initial processor which has sufficient hardware units for executing an initial assembly code. An initial processor core includes the maximum sub-operation parallelism for each SIMD functional unit. By gradually reducing sub-operation parallelism, we can finally have a processor core with small area meeting a given timing constraints. We show the effectiveness of our proposed algorithm through experimental results.