Measuring similarity of two musical pieces is an ill-defined problem for which recent research on contextual information, assigned as free-form text (tags) in social networking services, has shown to be highly effective. Nevertheless, approaches based on contextual information require adequate amount of tags per musical datum in order to be effective. In the case of the so called “cold-start” problem, this assumption is not valid for several music data. In this paper, we address this problem by proposing a combination of the audio and the tag feature space of musical data. The application of the proposed combination for musical data lacking contextual information is shown, through experimental results with real musical data, to evaluate more accurately their similarity than the use of solely audio-based similarity.