An established method to detect concept drift in data streams is to perform statistical hypothesis testing on the multivariate data in the stream. Statistical decision theory off...
In this paper, we study the problem of how to generate synthetic graphs matching various properties of a real social network with two applications, privacy preserving social netwo...
Most data mining operations include an integral search component at their core. For example, the performance of similarity search or classification based on Nearest Neighbors is ...
This paper is concerned with the task of travel-time prediction for an arbitrary origin-destination pair on a map. Unlike most of the existing studies, which focus only on a parti...
In active learning, one attempts to maximize classifier performance for a given number of labeled training points by allowing the active learning algorithm to choose which points...
The annotation of words and phrases by ontology concepts is extremely helpful for semantic interpretation. However many ontologies, e.g. WordNet, are too fine-grained and even hu...
Anomaly detection in multivariate time series is an important data mining task with applications to ecosystem modeling, network traffic monitoring, medical diagnosis, and other d...
Christopher Potter, Haibin Cheng, Pang-Ning Tan, S...
As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstruc...
To obtain correlated and complementary information contained in text mining and bibliometrics, hybrid clustering to incorporate textual content and citation information has become...
Bart De Moor, Frizo A. L. Janssens, Shi Yu, Wolfga...
Document collections evolve over time, new topics emerge and old ones decline. At the same time, the terminology evolves as well. Much literature is devoted to topic evolution in ...