This paper presents a stagewise least square (SLS) loss function for classification. It uses a least square form within each stage to approximate a bounded monotonic nonconvex loss function in a stagewise manner. Several benefits are obtained from using the SLS loss function, such as: (i) higher generalization accuracy and better scalability than classical least square loss; (ii) improved performance and robustness than convex loss (e.g., hinge loss of SVM); (iii) computational advantages compared with nonconvex loss (e.g. ramp loss in learning); (iv) ability to resist myopia of Empirical Risk Minimization and to boost the margin without boosting the complexity of the classifier. In addition, it naturally results in a kernel machine which is as sparse as SVM, yet much faster and simpler to train. A fast online learning algorithm with an integrated sparsification procedure is also provided. Experimental results on several benchmarks confirm the advantages of the proposed approach.