We propose a novel semi-supervised method for building a statistical model that represents the relationship between sounds and text labels (“tags”). The proposed method, named semi-supervised canonical density estimation, makes use of unlabeled sound data in two ways: 1) a low-dimensional latent space representing topics of sounds is extracted by a semi-supervised variant of canonical correlation analysis, and 2) topic models are learned by multi-class extension of semi-supervised kernel density estimation in the topic space. Real-world audio tagging experiments indicate that our proposed method improves the accuracy even when only a small number of labeled sounds are available.