Boosting has been widely applied in computer vision, especially after Viola and Jones's seminal work [23]. The marriage of rectangular features and integral-imageenabled fast computation makes boosting attractive for many vision applications. However, this popular way of applying boosting normally employs an exhaustive feature selection scheme from a very large hypothesis pool, which results in a less-efficient learning process. Furthermore, this poses additional constraint on applying boosting in an online fashion, where feature re-selection is often necessary because of varying data characteristic, but yet impractical due to the huge hypothesis pool. This paper proposes a gradient-based feature selection approach. Assuming a generally trained feature set and labeled samples are given, our approach iteratively updates each feature using the gradient descent, by minimizing the weighted least square error between the estimated feature response and the true label. In addition, we i...