This paper gives an insight into biometrics used for speaker recognition. Three different biometrics are presented, based on: acoustic, geometric lip, and holistic facial features. Experiments are carried out using a corpus of the DAVID audio-visual database. Recognition accuracy is found to be similar in the 2 domains. The geometric visual feature is based on a method of signature coding of the contour of the lips and the holistic feature is based on a mean dynamic signature, a method of capturing the motions of the face during a spoken utterance. Physical biometrics (static measurements) demand only small model sizes perhaps just a single template and therefore require less training data. Conversely behavioral biometrics contain more variation and demand more training data.
Matthew Roach, Jason Brand, John S. Mason