Sciweavers

SDM
2007
SIAM
81views Data Mining» more  SDM 2007»
13 years 10 months ago
A PAC Bound for Approximate Support Vector Machines
We study a class of algorithms that speed up the training process of support vector machines (SVMs) by returning an approximate SVM. We focus on algorithms that reduce the size of...
Dongwei Cao, Daniel Boley
SDM
2007
SIAM
149views Data Mining» more  SDM 2007»
13 years 10 months ago
WAT: Finding Top-K Discords in Time Series Database
Finding discords in time series database is an important problem in a great variety of applications, such as space shuttle telemetry, mechanical industry, biomedicine, and financ...
Yingyi Bu, Oscar Tat-Wing Leung, Ada Wai-Chee Fu, ...
SDM
2007
SIAM
130views Data Mining» more  SDM 2007»
13 years 10 months ago
Maximizing the Area under the ROC Curve with Decision Lists and Rule Sets
Decision lists (or ordered rule sets) have two attractive properties compared to unordered rule sets: they require a simpler classification procedure and they allow for a more co...
Henrik Boström
SDM
2007
SIAM
198views Data Mining» more  SDM 2007»
13 years 10 months ago
Learning from Time-Changing Data with Adaptive Windowing
We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, inst...
Albert Bifet, Ricard Gavaldà
SDM
2007
SIAM
137views Data Mining» more  SDM 2007»
13 years 10 months ago
Are approximation algorithms for consensus clustering worthwhile?
Consensus clustering has emerged as one of the principal clustering problems in the data mining community. In recent years the theoretical computer science community has generated...
Michael Bertolacci, Anthony Wirth
SDM
2007
SIAM
177views Data Mining» more  SDM 2007»
13 years 10 months ago
Multi-way Clustering on Relation Graphs
A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the n...
Arindam Banerjee, Sugato Basu, Srujana Merugu
SDM
2007
SIAM
187views Data Mining» more  SDM 2007»
13 years 10 months ago
Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning
Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregat...
Arindam Banerjee, Sugato Basu
SDM
2007
SIAM
120views Data Mining» more  SDM 2007»
13 years 10 months ago
An Analysis of Logistic Models: Exponential Family Connections and Online Performance
Logistic models are arguably one of the most widely used data analysis techniques. In this paper, we present analyses focussing on two important aspects of logistic models—its r...
Arindam Banerjee
SDM
2007
SIAM
131views Data Mining» more  SDM 2007»
13 years 10 months ago
Load Shedding in Classifying Multi-Source Streaming Data: A Bayes Risk Approach
In many applications, we monitor data obtained from multiple streaming sources for collective decision making. The task presents several challenges. First, data in sensor networks...
Yijian Bai, Haixun Wang, Carlo Zaniolo
SDM
2007
SIAM
162views Data Mining» more  SDM 2007»
13 years 10 months ago
Probabilistic Joint Feature Selection for Multi-task Learning
We study the joint feature selection problem when learning multiple related classification or regression tasks. By imposing an automatic relevance determination prior on the hypo...
Tao Xiong, Jinbo Bi, R. Bharat Rao, Vladimir Cherk...