: D-LDA: A Topic Modeling Approach without Constraint Generation for Semi-Defined Classification Fuzhen Zhuang, Ping Luo, Zhiyong Shen, Qing He, Yuhong Xiong, Zhongzhi Shi HP Laboratories HPL-2010-162 Semi-defined classification, Topic modeling, Gibbs Sampling, Semi-supervised clustering We study what we call semi-defined classification, which deals with the categorization tasks where the taxonomy of the data is not well defined in advance. It is motivated by the real-world applications, where the unlabeled data may also come from some other unknown classes besides the known classes for the labeled data. Given the unlabeled data, our goal is to not only identify the instances belonging to the known classes, but also cluster the remaining data into other meaningful groups. It differs from traditional semi-supervised clustering in the sense that in semi-supervised clustering the supervision knowledge is too far from being representative of a target classification, while in semi-defined ...