To judge how much a pair of words (or texts) are semantically related is a cognitive process. However, previous algorithms for computing semantic relatedness are largely based on co-occurrences within textual windows, and do not actively leverage cognitive human perceptions of relatedness. To bridge this perceptional gap, we propose to utilize free association as signals to capture such human perceptions. However, free association, being manually evaluated, has limited lexical coverage and is inherently sparse. We propose to expand lexical coverage and overcome sparseness by constructing an association network of terms and concepts that combines signals from free association norms and five types of cooccurrences extracted from the rich structures of Wikipedia. Our evaluation results validate that simple algorithms on this network give competitive results in computing semantic relatedness between words and between short texts.
Keyang Zhang, Kenny Q. Zhu, Seung-won Hwang