Over recent years automated face detection and recognition (FDR) have gained significant attention from the commercial and research sectors. This paper presents an embedded face detection solution aimed at addressing the real-time image processing requirements within a wide range of applications. As face detection is a computationally intensive task, an embedded solution would give rise to opportunities for discrete economical devices that could be applied and integrated into a vast majority of applications. This work focuses on the use of FPGAs as the embedded prototyping technology where the thread of execution is carried out on an embedded softcore processor. Custom instructions have been utilized as a means of applying software/hardware partitioning through which the computational bottlenecks are moved to hardware. A speedup by a factor of 110 was achieved from employing custom instructions and software optimizations.