Sequence segmentation is a central problem in the analysis of sequential and time-series data. In this paper we introduce and we study a novel variation to the segmentation proble...
In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such studies assume relatively balanced and stable data streams but...
We introduce a new approach for Clustering and Aggregating Relational Data (CARD). We assume that data is available in a relational form, where we only have information about the ...
We propose an unsupervised approach to learn associations between continuous-valued attributes from different modalities. These associations are used to construct a multi-modal t...
Abnormal events, such as security attacks, misconfigurations, or electricity failures, could have severe consequences toward the normal operation of the Border Gateway Protocol (...
Dejing Dou, Jun Li, Han Qin, Shiwoong Kim, Sheng Z...
Data perturbation is a popular technique for privacypreserving data mining. The major challenge of data perturbation is balancing privacy protection and data quality, which are no...
The widespread use of email has raised serious privacy concerns. A critical issue is how to prevent email information leaks, i.e., when a message is accidentally addressed to non-...
We study a class of algorithms that speed up the training process of support vector machines (SVMs) by returning an approximate SVM. We focus on algorithms that reduce the size of...
Finding discords in time series database is an important problem in a great variety of applications, such as space shuttle telemetry, mechanical industry, biomedicine, and financ...
Yingyi Bu, Oscar Tat-Wing Leung, Ada Wai-Chee Fu, ...
Decision lists (or ordered rule sets) have two attractive properties compared to unordered rule sets: they require a simpler classification procedure and they allow for a more co...