This paper presents a new model to measure semantic similarity in the taxonomy of WordNet, using edgecounting techniques. We weigh up our model against a benchmark set by human similarity judgment, and achieve a much improved result compared with other methods: the correlation with average human judgment on a standard 28 word pair dataset is 0.921, which is better than anything reported in the literature and also significantly better than average individual human judgments. As this set has been effectively used for algorithm selection and tuning, we also cross-validate an independent 37 word pair test set (0.876) and present results for the full 65 word pair superset (0.897).
Dongqiang Yang, David M. W. Powers