Sciweavers

SDM
2010
SIAM
204views Data Mining» more  SDM 2010»
13 years 10 months ago
Scalable Tensor Factorizations with Missing Data
The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer...
Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, Mo...
SDM
2010
SIAM
144views Data Mining» more  SDM 2010»
13 years 10 months ago
Predictive Modeling with Heterogeneous Sources
Lack of labeled training examples is a common problem for many applications. In the same time, there is usually an abundance of labeled data from related tasks. But they have diff...
Xiaoxiao Shi, Qi Liu, Wei Fan, Qiang Yang, Philip ...
SDM
2010
SIAM
166views Data Mining» more  SDM 2010»
13 years 10 months ago
A Permutation Approach to Validation
We give a permutation approach to validation (estimation of out-sample error). One typical use of validation is model selection. We establish the legitimacy of the proposed permut...
Malik Magdon-Ismail, Konstantin Mertsalov
SDM
2010
SIAM
153views Data Mining» more  SDM 2010»
13 years 10 months ago
The Generalized Dimensionality Reduction Problem
The dimensionality reduction problem has been widely studied in the database literature because of its application for concise data representation in a variety of database applica...
Charu C. Aggarwal
SDM
2010
SIAM
165views Data Mining» more  SDM 2010»
13 years 10 months ago
Direct Density Ratio Estimation with Dimensionality Reduction
Methods for directly estimating the ratio of two probability density functions without going through density estimation have been actively explored recently since they can be used...
Masashi Sugiyama, Satoshi Hara, Paul von Büna...
SDM
2010
SIAM
218views Data Mining» more  SDM 2010»
13 years 10 months ago
Confidence-Based Feature Acquisition to Minimize Training and Test Costs
We present Confidence-based Feature Acquisition (CFA), a novel supervised learning method for acquiring missing feature values when there is missing data at both training and test...
Marie desJardins, James MacGlashan, Kiri L. Wagsta...
SDM
2010
SIAM
181views Data Mining» more  SDM 2010»
13 years 10 months ago
Making k-means Even Faster
The k-means algorithm is widely used for clustering, compressing, and summarizing vector data. In this paper, we propose a new acceleration for exact k-means that gives the same a...
Greg Hamerly
SDM
2010
SIAM
146views Data Mining» more  SDM 2010»
13 years 10 months ago
Evaluating Query Result Significance in Databases via Randomizations
Many sorts of structured data are commonly stored in a multi-relational format of interrelated tables. Under this relational model, exploratory data analysis can be done by using ...
Markus Ojala, Gemma C. Garriga, Aristides Gionis, ...
SDM
2010
SIAM
149views Data Mining» more  SDM 2010»
13 years 10 months ago
Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization
Real-world relational data are seldom stationary, yet traditional collaborative filtering algorithms generally rely on this assumption. Motivated by our sales prediction problem, ...
Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneide...
SDM
2010
SIAM
195views Data Mining» more  SDM 2010»
13 years 10 months ago
Adaptive Informative Sampling for Active Learning
Many approaches to active learning involve periodically training one classifier and choosing data points with the lowest confidence. An alternative approach is to periodically cho...
Zhenyu Lu, Xindong Wu, Josh Bongard