Programming multi-processor ASIPs, such as network processors, remains an art due to the wide variety of architectures and due to little support for exploring different implementation alternatives. We present a study that implements an IP forwarding router application on two different network processors to better understand the main challenges in programming such multi-processor ASIPs. The goal of this study is to identify the elements central to a successful deployment of such systems based on a detailed profiling of the two architectures. Our results show that inefficient partitioning can impact the throughput by more than 30%; a better arbitration of resources increases the throughput by at least 10%, and localization of computation related to the memories can increase the available bandwidth on internal buses by a factor of two. The main observation of our study is that there is a critical lack of tools and methods that support an integrated approach to partitioning, scheduling...