We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, inst...
Consensus clustering has emerged as one of the principal clustering problems in the data mining community. In recent years the theoretical computer science community has generated...
A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the n...
Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregat...
Logistic models are arguably one of the most widely used data analysis techniques. In this paper, we present analyses focussing on two important aspects of logistic models—its r...
In many applications, we monitor data obtained from multiple streaming sources for collective decision making. The task presents several challenges. First, data in sensor networks...
We study the joint feature selection problem when learning multiple related classification or regression tasks. By imposing an automatic relevance determination prior on the hypo...
Tao Xiong, Jinbo Bi, R. Bharat Rao, Vladimir Cherk...
String data is especially important in the privacy preserving data mining domain because most DNA and biological data is coded as strings. In this paper, we will discuss a new met...
In recent years, privacy preserving data mining has become very important because of the proliferation of large amounts of data on the internet. Many data sets are inherently high...
In recent years, random projection has been used as a valuable tool for performing dimensionality reduction of high dimensional data. Starting with the seminal work of Johnson and...