We show that the optimal complexity of Nesterov's smooth first-order optimization algorithm is preserved when the gradient is only computed up to a small, uniformly bounded er...
Neural networks use neurons of the same type in each layer but such architecture cannot lead to data models of optimal complexity and accuracy. Networks with architectures (number ...