Modern software systems are typically large and complex, making comprehension of these systems extremely difficult. Experienced programmers comprehend code by seamlessly processing synonyms and other word relations. Thus, we believe that automated comprehension and software tools can be significantly improved by leveraging word relations in software. In this paper, we perform a comparative study of six state of the art, English-based semantic similarity techniques and evaluate their effectiveness on words from the comments and identifiers in software. Our results suggest that applying English-based semantic similarity techniques to software without any customization could be detrimental to the performance of the client software tools. We propose strategies to customize the existing semantic similarity techniques to software, and describe how various program comprehension tools can benefit from word relation information.
Giriprasad Sridhara, Emily Hill, Lori L. Pollock,