As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. The need to discover and reconcile variant forms of strings in bibliographic entries, i.e., authority work, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. Approximate string matching has traditionally been used to help with this problem. In this paper we introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms.
James C. French, Allison L. Powell, Eric Schulman