Abstract—Multi-processor systems-on-chips are widely adopted in implementing modern streaming applications to satisfy the ever increasing computing requirements. Predictable memory hierarchies, which make memory access predictable, can better satisfy the strict timing requirements of streaming applications. However, different levels of the memory hierarchy vary in latency and capacity. Hence, the system performance not only depends on the task schedule but also closely relates with the FIFO size distribution and FIFO allocation, which makes the scheduling problem much more complex. We propose an efficient Iterationbased Task-FIFO Co-Scheduling algorithm to optimize the FIFO size distribution and task/FIFO assignment. Randomly generated Synchronous Dataflow Graphs with different sizes and a set of practical applications are used to evaluate the performance of the proposed method. The experimental results demonstrate that the proposed algorithm outperforms the load balancing method a...