An approach for mining repositories of web-based user documentation for patterns of evolutionary change in the context of internationalization and localization is presented. Sets of documents that are changed together during the translation process are uncovered and documented to support future evolution of the system. A sequential-pattern mining technique is used to uncover the patterns from Subversion repositories. The approach is applied to the open source KDE system. KDE maintains documentation for over fifty different natural languages and presents a prime example of the problem. Characteristics of the uncovered patterns such as size, frequency, and occurrences within a single language or across multiple languages are discussed. Such patterns help provide insight as to the effort required in retranslation due to a change in the documentation and help user communities estimated the progress of documentation in their respective languages.
Huzefa H. Kagdi, Jonathan I. Maletic