Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

9 years 10 months ago

Download machinelearning.wustl.edu

We improve a recent guarantee of Bach and Moulines on the linear convergence of SGD for smooth and strongly convex objectives, reducing a quadratic dependence on the strong convexity to a linear dependence. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence on average smoothness, dominating previous results, and more broadly discus how importance sampling for SGD can improve convergence also in other scenarios. Our results are based on a connection between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods.

Deanna Needell, Nathan Srebro, Rachel Ward

Real-time Traffic

Information Technology | MP 2016 |

claim paper

Post Info
More Details (n/a)

Added	08 Apr 2016
Updated	08 Apr 2016
Type	Journal
Year	2016
Where	MP
Authors	Deanna Needell, Nathan Srebro, Rachel Ward

Comments (0)

Sciweavers

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Information Technology | MP 2016 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers