We present a system to retrieve all clips from a meeting archive that show a particular individual speaking, using a single face or voice sample as the query. The system incorporates three novel ideas. One, rather than match the query to each individual sample in the archive, samples within a meeting are grouped first, generating a cluster of samples per individual. The query is then matched to the cluster, taking advantage of multiple samples to yield a robust decision. Two, automatic audio-visual association is performed which allows a bi-modal retrieval of clips, even when the query is uni-modal. Three, the biometric recognition uses individual-specific score distributions learnt from the clusters, in a likelihood ratio based decision framework that obviates the need for explicit normalization or modality weighting. The resulting system, which is completely automated, performs with 92.6% precision at 90% recall on a dataset of 16 real meetings spanning a total of 13 hours.