In modern multimedia databases, objects can be represented by a large variety of feature representations. In order to employ all available information in a best possible way, a joint statement about object similarity must be derived. In this paper, we present a novel technique for multi-represented similarity estimation which is based on probability distributions modeling the connection between the distance value and object similarity. To tune these distribution functions to model the similarity in each representation, we propose a bootstrapping approach maximizing the agreement between the distributions. Thus, we capture the general notion of similarity which is implicitly given by the distance relationships in the available feature representations. Thus, our approach does not need any training examples. In our experimental evaluation, we demonstrate that our new approach offers superior precision and recall compared to standard similarity measures on a real world audio data set.