Sciweavers

SDM
2007
SIAM
198views Data Mining» more  SDM 2007»
14 years 29 days ago
Learning from Time-Changing Data with Adaptive Windowing
We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, inst...
Albert Bifet, Ricard Gavaldà
SDM
2007
SIAM
137views Data Mining» more  SDM 2007»
14 years 29 days ago
Are approximation algorithms for consensus clustering worthwhile?
Consensus clustering has emerged as one of the principal clustering problems in the data mining community. In recent years the theoretical computer science community has generated...
Michael Bertolacci, Anthony Wirth
SDM
2007
SIAM
177views Data Mining» more  SDM 2007»
14 years 29 days ago
Multi-way Clustering on Relation Graphs
A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the n...
Arindam Banerjee, Sugato Basu, Srujana Merugu
SDM
2007
SIAM
187views Data Mining» more  SDM 2007»
14 years 29 days ago
Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning
Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregat...
Arindam Banerjee, Sugato Basu
SDM
2007
SIAM
120views Data Mining» more  SDM 2007»
14 years 29 days ago
An Analysis of Logistic Models: Exponential Family Connections and Online Performance
Logistic models are arguably one of the most widely used data analysis techniques. In this paper, we present analyses focussing on two important aspects of logistic models—its r...
Arindam Banerjee
SDM
2007
SIAM
131views Data Mining» more  SDM 2007»
14 years 29 days ago
Load Shedding in Classifying Multi-Source Streaming Data: A Bayes Risk Approach
In many applications, we monitor data obtained from multiple streaming sources for collective decision making. The task presents several challenges. First, data in sensor networks...
Yijian Bai, Haixun Wang, Carlo Zaniolo
SDM
2007
SIAM
162views Data Mining» more  SDM 2007»
14 years 29 days ago
Probabilistic Joint Feature Selection for Multi-task Learning
We study the joint feature selection problem when learning multiple related classification or regression tasks. By imposing an automatic relevance determination prior on the hypo...
Tao Xiong, Jinbo Bi, R. Bharat Rao, Vladimir Cherk...
SDM
2007
SIAM
195views Data Mining» more  SDM 2007»
14 years 29 days ago
On Anonymization of String Data
String data is especially important in the privacy preserving data mining domain because most DNA and biological data is coded as strings. In this paper, we will discuss a new met...
Charu C. Aggarwal, Philip S. Yu
SDM
2007
SIAM
118views Data Mining» more  SDM 2007»
14 years 29 days ago
On Privacy-Preservation of Text and Sparse Binary Data with Sketches
In recent years, privacy preserving data mining has become very important because of the proliferation of large amounts of data on the internet. Many data sets are inherently high...
Charu C. Aggarwal, Philip S. Yu
SDM
2007
SIAM
133views Data Mining» more  SDM 2007»
14 years 29 days ago
On Point Sampling Versus Space Sampling for Dimensionality Reduction
In recent years, random projection has been used as a valuable tool for performing dimensionality reduction of high dimensional data. Starting with the seminal work of Johnson and...
Charu C. Aggarwal