Automatic facial feature localization has been a longstanding challenge in the field of computer vision for several decades. This can be explained by the large variation a face in an image can have due to factors such as position, facial expression, pose, illumination, and background clutter. Support Vector Machines (SVMs) have been a popular statistical tool for facial feature detection. Traditional SVM approaches to facial feature detection typically extract features from images (e.g. multiband filter, SIFT features) and learn the SVM parameters. Independently learning features and SVM parameters might result in a loss of information related to the classification process. This paper proposes an energy-based framework to jointly perform relevant feature weighting and SVM parameter learning. Preliminary experiments on standard face databases have shown significant improvement in speed with our approach.