Privacy consideration has much significance in the application of data mining. It is very important that the privacy of individual parties will not be exposed when data mining te...
Sampling has been recognized as an important technique to improve the efficiency of clustering. However, with sampling applied, those points which are not sampled will not have t...
It is estimated that ninety percent of the world’s species have yet to be discovered and described. The main reason for the slow pace of new species description is that the scie...
Yixin Chen, Henry L. Bart Jr., Shuqing Huang, Huim...
There has been increasing number of independently proposed randomization methods in different stages of decision tree construction to build multiple trees. Randomized decision tre...
Wei Fan, Ed Greengrass, Joe McCloskey, Philip S. Y...
In this paper, we formulate the problem of summarization of a dataset of transactions with categorical attributes as an optimization problem involving two objective functions - co...
Significant vulnerabilities have recently been identified in collaborative filtering recommender systems. Researchers have shown that attackers can manipulate a system’s reco...
Robin D. Burke, Bamshad Mobasher, Runa Bhaumik, Ch...
Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available b...
Recommendation algorithms aim at proposing “next” pages to a user based on her current visit and the past users’ navigational patterns. In the vast majority of related algor...
The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical ap...
We propose an algorithm to construct classification models with a mixture of kernels from labeled and unlabeled data. The derived classifier is a mixture of models, each based o...