In this paper, we propose a novel method to select the most informative subset of features, which has little redundancy and very strong discriminating power. Our proposed approach automatically determines the optimal number of features and selects the best subset accordingly by maximizing the average pairwise informativeness, thus has obvious advantage over traditional filter methods. By relaxing the essential combinatorial optimization problem into the standard quadratic programming problem, the most informative feature subset can be obtained efficiently, and a strategy to dynamically compute the redundancy between feature pairs further greatly accelerates our method through avoiding unnecessary computations of mutual information. As shown by the extensive experiments, the proposed method can successfully select the most informative subset of features, and the obtained classification results significantly outperform the state-of-the-art results on most test datasets.