Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

138

ECML
2007
Springer

favoriteEmaildiscussreport

175views Machine Learning» more ECML 2007»

Principal Component Analysis for Large Scale Problems with Lots of Missing Values

15 years 8 months ago

Principal Component Analysis for Large Scale Problems with Lots of Missing Values

Download www.cis.hut.fi

Abstract. Principal component analysis (PCA) is a well-known classical data analysis technique. There are a number of algorithms for solving the problem, some scaling better than others to problems with high dimensionality. They also diﬀer in their ability to handle missing values in the data. We study a case where the data are high-dimensional and a majority of the values are missing. In case of very sparse data, overﬁtting becomes a severe problem even in simple linear models such as PCA. We propose an algorithm based on speeding up a simple principal subspace rule, and extend it to use regularization and variational Bayesian (VB) learning. The experiments with Netﬂix data conﬁrm that the proposed algorithm is much faster than any of the compared methods, and that VB-PCA method provides more accurate predictions for new data than traditional PCA or regularized PCA.

Tapani Raiko, Alexander Ilin, Juha Karhunen

Real-time Traffic

ECML 2007 | Machine Learning | Principal Component Analysis | Simple Principal Subspace | Well-known Classical Data |

claim paper

Related Content

» Practical Approaches to Principal Component Analysis in the Presence of Missing Values

» Robust imputation method for missing values in microarray data

» Nonlinear Component Analysis for LargeScale Data Set Using FixedPoint Algorithm

» Missing value estimation for DNA microarray gene expression data by Support Vector Regress...

» Mining Console Logs for LargeScale System Problem Detection

» Principal Component Analysis for Distributed Data Sets with Updating

» Integrative missing value estimation for microarray data

» An Incremental Subspace Learning Algorithm to Categorize Large Scale Text Data

» LargeScale Maximum Margin Discriminant Analysis Using Core Vector Machines

Post Info
More Details (n/a)

Added	07 Jun 2010
Updated	07 Jun 2010
Type	Conference
Year	2007
Where	ECML
Authors	Tapani Raiko, Alexander Ilin, Juha Karhunen

Comments (0)