For face recognition from video streams speed and accuracy are vital aspects. The first decision whether a preprocessed image region represents a human face or not is often made by a neural network, e.g., in the Viisage-FaceFINDER video surveillance system. We describe the optimization of such a network by a hybrid algorithm combining evolutionary computation and gradient-based learning. The evolved solutions perform considerably faster than an expert-designed architecture without loss of accuracy.