The ubiquity of many-core architectures poses challenges to software developers to make scalable software. To parallelize data-intensive applications on a many-core platform, one has to consider both hardware architecture and software characteristics when writing parallel codes. In this paper, we take Motion JPEG decoder as an example data-intensive application and take TILE64 as an example many-core platform. We parallelize the decoder with two different strategies and observe their impact on program performance and scalability. We design two algorithms, READ and WRITE, which differ in the direction of data movement between processor cores. Experimental results show that READ algorithm outperforms WRITE algorithm by 217% when decoding 1080P video on the TILE64 platform. It indicates that the arrangement of data flows in a data-intensive parallel program can have huge impact on program performance and scalability on a many-core platform.