In this paper we present the venture of porting two different algorithms for solving the two-dimensional advection PDE on the CBE platform, an in-place and an outof-place one, and compare their computational performance, completion time and code productivity. Study of the advection equation reveals data dependencies which lead to limited performance and inefficient scaling to parallel architectures. We explore programming techniques and optimizations which maximize performance for these solver versions. The out-ofplace version is straightforward to implement and achieves greater raw performance than the in-place one, but requires more computational steps to converge. In both cases, achieving high computational performance relies heavily on manual source code optimization, due to compiler incapability to do data vectorization and efficient instruction scheduling. The latter proves to be a key factor in pursuit of high GFLOPS measurements. Keywords-explicit memory hierarchy; Cell Broadba...