We consider a multi-cell frequency-selective fading uplink channel (network MIMO) from K singleantenna user terminals (UTs) to B cooperative base stations (BSs) with M antennas each. The BSs, assumed to be oblivious of the applied codebooks, forward compressed versions of their observations to a central station (CS) via capacity limited backhaul links. The CS jointly decodes the messages from all UTs. Since the BSs and the CS are assumed to have no prior channel state information (CSI), the channel needs to be estimated during its coherence time. Based on a lower bound of the ergodic mutual information, we determine the optimal fraction of the coherence time used for channel training, taking different path losses between the UTs and the BSs into account. We then study how the optimal training length is impacted by the backhaul capacity. Although our analytical results based on random matrix theory are proved to be tight in the large system limit, we show by simulations that they provi...