As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the req...
Clustering methods usually require to know the best number of clusters, or another parameter, e.g. a threshold, which is not ever easy to provide. This paper proposes a new graph-b...
In this work we propose a novel approach to anomaly detection in streaming communication data. We first build a stochastic model for the system based on temporal communication pa...
A fundamental task of data analysis is comprehending what distinguishes clusters found within the data. We present the problem of mining distinguishing sets which seeks to find s...
The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular,...
Byung-Won On, Ergin Elmacioglu, Dongwon Lee, Jaewo...