Sense annotation and lexicon building are costly affairs demanding prudent investment of resources. Recent work on multilingual WSD has shown that it is possible to leverage the annotation work done for WSD of one language (SL) for another (TL), by projecting Wordnet and sense marked corpus parameters of SL to TL. However, this work does not take into account the cost of manually cross-linking the words within aligned synsets. Further, it does not answer the question of "Can better accuracy be achieved if a user is willing to pay additional money?" We propose a measure for cost-benefit analysis which measures the "value for money" earned in terms of accuracy by investing in annotation effort and lexicon building. Two key ideas explored in this paper are (i) the use of probabilistic crosslinking model to reduce manual crosslinking effort and (ii) the use of selective sampling to inject a few training examples for hard-to-disambiguate words from the target language t...
Mitesh M. Khapra, Saurabh Sohoney, Anup Kulkarni,