Word Sense Induction (WSI) is the task of identifying the different senses (uses) of a target word in a given text. Traditional graph-based approaches create and then cluster a graph, in which each vertex corresponds to a word that co-occurs with the target word, and edges between vertices are weighted based on the co-occurrence frequency of their associated words. In contrast, in our approach each vertex corresponds to a collocation that co-occurs with the target word, and edges between vertices are weighted based on the co-occurrence frequency of their associated collocations. A smoothing technique is applied to identify more edges between vertices and the resulting graph is then clustered. Our evaluation under the framework of SemEval-2007 WSI task shows the following: (a) our approach produces less sense-conflating clusters than those produced by traditional graph-based approaches, (b) our approach outperforms the existing state-of-the-art results.
Ioannis P. Klapaftis, Suresh Manandhar