Optimal Component Analysis (OCA) is a linear method for feature extraction and dimension reduction. It has been widely used in many applications such as face and object recognitions. The optimal basis of OCA is obtained through solving an optimization problem on a Grassmann manifold. However, one limitation of OCA is the computational cost becoming heavy when the number of training data is large, which prevents OCA from efficiently applying in many real applications. In this paper, a scalable OCA (S-OCA) that uses a two-stage strategy is developed to bridge this gap. In the first stage, we cluster the training data using K-means algorithm and the dimension of data is reduced into a low dimensional space. In the second stage, OCA search is performed in the reduced space and the gradient is updated using an numerical approximation. In the process of OCA gradient updating, instead of choosing the entire training data, S-OCA randomly chooses a small subset of the training images in each...