This article investigates the effectiveness of community generated tags as social descriptors of resources uncoordinatedly annotated by community members. Our goal is to demonstrate practically that the aggregated tags applied to resources by the entire community define reasonably well resource meaning. This would allow using them for calculating semantic distance between resources. To test our hypothesis, we analyzed a large amount of data downloaded from del.icio.us. To this end, we developed an algorithm for searching ‘similar’ URLs based on the similarity of their aggregated tag vectors, which allowed us to identify clusters of similar resources. Our experimental findings demonstrate that massive tagging of resources leads to resource meanings that are defined bottom-up, and they prove the effectiveness of collaborative tagging systems for describing resources.
Jinsheng Xu, Christo Dichev, Albert C. Esterline,