Abstract. In this work, we propose two approaches of hiding predictive association rules where the data sets are horizontally distributed and owned by collaborative but non-trustin...
Abstract. Increasingly huge RDF data sets are being published on the Web. Currently, they use different syntaxes of RDF, contain high levels of redundancy and have a plain indivisi...
Several methods for automatically generating labeled examples that can be used as training data for WSD systems have been proposed, including a semisupervised approach based on re...
Abstract Medical data sets consist of a huge amount of data organized in instances, where each one contains several attributes. The quality of the models obtained from a database s...
The present paper studies the influence of two distinct factors on the performance of some resampling strategies for handling imbalanced data sets. In particular, we focus on the n...
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model ...
The prevailing approach to evaluating classifiers in the machine learning community involves comparing the performance of several algorithms over a series of usually unrelated data...
We pose the development of cognitively plausible models of human language processing as a challenge for computational linguistics. Existing models can only deal with isolated phen...
Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers t...
Many real world systems can be modeled as networks or graphs. Clustering algorithms that help us to organize and understand these networks are usually referred to as, graph based c...