An enormous amount of information available via the Internet exists. Much of this data is in the form of text-based documents. These documents cover a variety of topics that are vitally important to the scientific, business, and defense/security communities. Currently, there are a many techniques for processing and analyzing such data. However, the ability to quickly characterize a large set of documents still proves challenging. Previous work has successfully demonstrated the use of a genetic algorithm for providing a representative subset for text documents via adaptive sampling. In this work, we further expand and explore this approach on much larger data sets using a parallel Genetic Algorithm (GA) with adaptive parameter control. Experimental results are presented and discussed. Categories and Subject Descriptors I.7.0 [Document and Text Processing]: General General Terms Algorithms, Performance, Design, Experimentation Keywords Text analysis, parallel genetic algorithm, intellig...
Robert M. Patton, Thomas E. Potok