Deep Learning Made Easier by Linear Transformations in Perceptrons

13 years 9 months ago

Download jmlr.csail.mit.edu

We transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero output and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. This transformation aims at separating the problems of learning the linear and nonlinear parts of the whole input-output mapping, which has many beneﬁts. We study the theoretical properties of the transformation by noting that they make the Fisher information matrix closer to a diagonal matrix, and thus standard gradient closer to the natural gradient. We experimentally conﬁrm the usefulness of the transformations by noting that they make basic stochastic gradient learning competitive with state-of-the-art learning algorithms in speed, and that they seem also to help ﬁnd solutions that generalize better. The experiments include both classiﬁcation of small images and learning a lowdimensional representation for images by using a deep unsupervised auto-encoder network. The...

Tapani Raiko, Harri Valpola, Yann LeCun

Real-time Traffic

Hidden Neuron | JMLR 2012 | Natural Gradient | Programming Languages | Zero Slope |

claim paper

Post Info
More Details (n/a)

Added	27 Sep 2012
Updated	27 Sep 2012
Type	Journal
Year	2012
Where	JMLR
Authors	Tapani Raiko, Harri Valpola, Yann LeCun

Comments (0)

Sciweavers

Deep Learning Made Easier by Linear Transformations in Perceptrons

Hidden Neuron | JMLR 2012 | Natural Gradient | Programming Languages | Zero Slope |

Explore & Download

Productivity Tools

Sciweavers