Many DSP algorithms are very computationally intensive. They are typically implemented using an ensemble of processing elements (PEs) operating in parallel. The results from PEs need to be communicated with other PEs, and for many applications the cost of implementing the communication between PEs is very high. Given a DSP algorithm with high communication complexity, it is natural to use a Network-on-Chip (NoC) to implement the communication. We address two key optimization problems that arise in this context—placement, i.e., assigning computations to PEs on the NoC, and scheduling, i.e., constructing a detailed cycleby-cycle scheme for implementing the communication between PEs on the NoC.