A novel algorithm called Average Neighborhood Margin Maximization (ANMM) is proposed for supervised linear feature extraction. For each data point, ANMM aims at pulling the neighboring points with the same class label towards it as near as possible, while simultaneously pushing the neighboring points with different labels away from it as far as possible. We will show that features extracted from ANMM can separate the data from different classes well, and it avoids the small sample size problem existed in traditional Linear Discriminant Analysis (LDA). The kernelized (nonlinear) counterpart of ANMM is also established in this paper. Moreover, as in many computer vision applications the data are more naturally represented by higher order tensors (e.g. images and videos), we develop a tensorized (multilinear) form of ANMM, which can directly extract features from tensors. The experimental results of applying ANMM to face recognition are presented to show the effectiveness of our method.