We treat feature selection and basis selection in a unified framework by introducing the masking matrix. If one considers feature selection as finding a binary mask vector that determines which features participate in the learning process, and similarly, basis selection as finding a binary mask vector that determines which basis vectors are needed for the learning process, then the masking matrix is, in particular, the outer product of the feature masking vector and the basis masking vector. This representation allows for a joint estimation of both features and basis. In addition, it allows one to select features that appear in only part of the basis functions. This joint selection of feature/basis subset is not possible when using feature selection and basis selection algorithms independently. thus, the masking matrix help extend feature and basis selection methods while blurring the lines between them. The problem of searching for a masking matrix is NP-hard and we offer a sub-optim...