A huge amount of data and metadata emerges from Web 2.0 applications which have transformed the Web to a mass social interaction and collaboration medium. Collaborative Tagging Systems is a typical, popular and promising Web 2.0 application and despite its adoption it faces some serious limitations that restrict their usability. These limitations (no structure on tags, tags validation, spamming and redundancy) are more evident in the case of multimedia content due to its challenging automatic annotation and retrieval requirements. In this paper, we present an approach for social data clustering which combines jointly semantic, social and content-based information. We propose an unsupervised model for efficient and scalable mining on multimedia social-related data, which leads to the extraction of rich and trustworthy semantics and the improvement of retrieval in a social tagging system. Experimental results demonstrate the efficiency of the proposed approach.