Deep learning has been successfully applied to perform non-linear embedding. In this paper, we present supervised embedding techniques that use a deep network to collapse classes. The network is pre-trained using a stack of RBMs, and finetuned using approaches that try to collapse classes. The finetuning is inspired by ideas from NCA, but it uses a Student t-distribution to model the similarities of data points belonging to the same class in the embedding. We investigate two types of objective functions: deep t-distributed MCML (dt-MCML) and deep tdistributed NCA (dt-NCA). Our experiments on two handwritten digit data sets reveal the strong performance of dt-MCML in supervised parametric data visualization, whereas dt-NCA outperforms alternative techniques when embeddings with more than two or three dimensions are constructed, e.g., to obtain good classification performances. Overall, our results demonstrate the advantage of using a deep architecture and a heavy-tailed t-distribution ...