Recently, the so-called Support Feature Machine (SFM) was proposed as a novel approach to feature selection for classification, based on minimisation of the zero norm of a separating hyperplane. We propose an extension for linearly non-separable datasets that allows a direct trade-off between the number of misclassified data points and the number of dimensions. Results on toy examples as well as real-world datasets demonstrate that this method is able to identify relevant features very effectively. Keywords-Support feature machine, feature selection, zero norm minimisation, classification.