The proposed feature selection method aims to find a minimum subset of the most informative variables for classification/regression by efficiently approximating the Markov Blanket which is a set of variables that can shield a certain variable from the target. Instead of relying on the conditional independence test or network structure learning, the new method uses Hilbert-Schmidt Independence criterion as a measure of dependence among variables in a kernel-induced space. This allows effective approximation of the Markov Blanket that consists of multiple dependent features rather than being limited to a single feature. In addition, the new method can remove both irrelevant and redundant features at the same time. This method for discovering the Markov Blanket is applicable to both discrete and continuous variables, whereas previous methods cannot be used directly for continuous features and therefore are not applicable to regression problems. Experimental evaluations on synthetic and be...