Word discovery is the task of discovering and collecting occurrences of repeating words in the absence of prior acoustic and linguistic knowledge, or training material. The capability of extracting such patterns (or motifs) represents a preliminary step towards automatic mining of contentful information in spoken documents. The absence of modelling and training data, forces the use of direct pattern matching of speech templates, which, in turn, is sensitive to speech variability, like the inter-speaker one, for instance. In the present work, a variability tolerant pattern recognition technique is proposed that relies on the comparison of self similarity matrices of speech sequences. The joint use of such technique and a dynamic time warping dissimilarity measure, is shown to account for more variability with respect to the DTW-based system alone, as demonstrated on several hours of broadcast news shows.