Krylov Subspace Descent for Deep Learning

13 years 9 months ago

Download jmlr.csail.mit.edu

In this paper, we propose a second order optimization method to learn models where both the dimensionality of the parameter space and the number of training samples is high. In our method, we construct on each iteration a Krylov subspace formed by the gradient and an approximation to the Hessian matrix, and then use a subset of the training data samples to optimize over this subspace. As with the Hessian Free (HF) method of Martens (2010), the Hessian matrix is never explicitly constructed, and is computed using a subset of data. In practice, as in HF, we typically use a positive definite substitute for the Hessian matrix such as the Gauss-Newton matrix. We investigate the eﬀectiveness of our proposed method on deep neural networks, and compare its performance to widely used methods such as stochastic gradient descent, conjugate gradient descent and L-BFGS, and also to HF. Our method leads to faster convergence than either L-BFGS or HF, and generally performs better than either of t...

Oriol Vinyals, Daniel Povey

Real-time Traffic

Conjugate Gradient | Gradient Descent | JMLR 2012 | Krylov Subspace | Programming Languages |

claim paper

Post Info
More Details (n/a)

Added	27 Sep 2012
Updated	27 Sep 2012
Type	Journal
Year	2012
Where	JMLR
Authors	Oriol Vinyals, Daniel Povey

Comments (0)

Sciweavers

Krylov Subspace Descent for Deep Learning

Conjugate Gradient | Gradient Descent | JMLR 2012 | Krylov Subspace | Programming Languages |

Explore & Download

Productivity Tools

Sciweavers