Abstract--Recently, sparse approximation has become a preferred method for learning large scale kernel machines. This technique attempts to represent the solution with only a subset of original data points also known as basis vectors, which are usually chosen one by one with a forward selection procedure based on some selection criteria. The computational complexity of several resultant algorithms scales as ( 2) in time and ( ) in memory, where is the number of training points and is the number of basis vectors as well as the steps of forward selection. For some large scale data sets, to obtain a better solution, we are sometimes required to include more basis vectors, which means that is not trivial in this situation. However, the limited computational resource (e.g., memory) prevents us from including too many vectors. To handle this dilemma, we propose to add an ensemble of basis vectors instead of only one at each forward step. The proposed method, closely related to gradient boost...