The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
A new hierarchical nonparametric Bayesian model is proposed for the problem of multitask learning (MTL) with sequential data. Sequential data are typically modeled with a hidden M...
Imbalanced data learning has recently begun to receive much attention from research and industrial communities as traditional machine learners no longer give satisfactory results. ...
Boosting is a simple yet powerful modeling technique that is used in many machine learning and data mining related applications. In this paper, we propose a novel scale-space based...
—Information about individuals on publicly available web sites stands as a valuable, yet unorganized, data source. Turning such an enormous data source into a “database” is h...