We present a novel language identification technique using our recently developed deep-structured conditional random fields (CRFs). The deep-structured CRF is a multi-layer CRF model in which each higher layer’s input observation sequence consists of the lower layer’s observation sequence and the resulting lower layer’s frame-level marginal probabilities. In this paper we extend the original deep-structured CRF by allowing for distinct state representations at different layers and demonstrate its benefits. We propose an unsupervised algorithm to pre-train the intermediate layers by casting it as a multi-objective programming problem that is aimed at minimizing the average frame-level conditional entropy while maximizing the state occupation entropy. Empirical evaluation on a seven-language/dialect voice mail routing task showed that our approach can achieve a routing accuracy (RA) of 86.4% and average equal error rate (EER) of 6.6%. These results are significantly better than th...