Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...
This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm is designed for distributed inferencing, data compact...
In this paper we present the Dynamic Grow-Shrink Inference-based Markov network learning algorithm (abbreviated DGSIMN), which improves on GSIMN, the state-ofthe-art algorithm for...
Multi-label learning refers to the problems where an instance can be assigned to more than one category. In this paper, we present a novel Semi-supervised algorithm for Multi-labe...
Gang Chen, Yangqiu Song, Fei Wang, Changshui Zhang
Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potential...
Optimally designing the location of training input points (active learning) and choosing the best model (model selection) are two important components of supervised learning and h...
Given an author-conference network that evolves over time, which are the conferences that a given author is most closely related with, and how do they change over time? Large time...
Hanghang Tong, Spiros Papadimitriou, Philip S. Yu,...
We describe a fast algorithm for kernel discriminant analysis, empirically demonstrating asymptotic speed-up over the previous best approach. We achieve this with a new pattern of...
This paper studies the ensemble selection problem for unsupervised learning. Given a large library of different clustering solutions, our goal is to select a subset of solutions t...
Imbalanced class problems appear in many real applications of classification learning. We propose a novel sampling method to improve bagging for data sets with skewed class distri...