Abstract. We present a method for using aligned ontologies to merge taxonomically organized data sets that have apparently compatible schemas, but potentially different semantics f...
Abstract. Flow-based intrusion detection has recently become a promising security mechanism in high speed networks (1-10 Gbps). Despite the richness in contributions in this field...
Anna Sperotto, Ramin Sadre, Frank van Vliet, Aiko ...
The collection of behavior protocols is a common practice in human factors research, but the analysis of these large data sets has always been a tedious and time-consuming process....
Walter C. Mankowski, Peter Bogunovich, Ali Shokouf...
When training Support Vector Machine (SVM), selection of a training data set becomes an important issue, since the problem of overfitting exists with a large number of training da...
We solve the problem of record linkage between databases where record fields are mixed and permuted in different ways. The solution method uses a conditional random fields model...
Abstract. SOMs have proven to be a very powerful tool for data analysis. However, comparing multiple SOMs trained on the same data set using different parameters or initialisation...
Rudolf Mayer, Robert Neumayer, Doris Baum, Andreas...
This paper proposes a new approach to the challenging open-set language detection task. Most state-of-the-art approaches make use of data sources with several out-of-set languages...
Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori ...
Privacy models such as k-anonymity and -diversity typically offer an aggregate or scalar notion of the privacy property that holds collectively on the entire anonymized data set....
Supervised classification methods have been shown to be very effective for a large number of applications. They require a training data set whose instances are labeled to indicate...
Partitioning is an important step in several database algorithms, including sorting, aggregation, and joins. Partitioning is also fundamental for dividing work into equal-sized (o...