Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the computations and/or to implicitly implement a form of statistical regularization. In this paper, we consider second-order iterative optimization algorithms, i.e., those that use Hessian as well as gradient information, and we provide bounds on the convergence of the variants of Newton’s method that incorporate uniform sub-sampling as a means to estimate the gradient and/or Hessian. Our bounds are non-asymptotic, i.e., they hold for finite number of data points in finite dimensions for finite number of iterations. In addition, they are quantitative and depend on the quantities related to the problem, i.e., the condition number. However, our algorithms are global and are guaranteed to converge from any initial iterate. Using random matrix concentration ine...
Farbod Roosta-Khorasani, Michael W. Mahoney