Sciweavers

GECCO
2006
Springer

Characterizing large text corpora using a maximum variation sampling genetic algorithm

14 years 4 months ago
Characterizing large text corpora using a maximum variation sampling genetic algorithm
An enormous amount of information available via the Internet exists. Much of this data is in the form of text-based documents. These documents cover a variety of topics that are vitally important to the scientific, business, and defense/security communities. Currently, there are a many techniques for processing and analyzing such data. However, the ability to quickly characterize a large set of documents still proves challenging. Previous work has successfully demonstrated the use of a genetic algorithm for providing a representative subset for text documents via adaptive sampling. In this work, we further expand and explore this approach on much larger data sets using a parallel Genetic Algorithm (GA) with adaptive parameter control. Experimental results are presented and discussed. Categories and Subject Descriptors I.7.0 [Document and Text Processing]: General General Terms Algorithms, Performance, Design, Experimentation Keywords Text analysis, parallel genetic algorithm, intellig...
Robert M. Patton, Thomas E. Potok
Added 23 Aug 2010
Updated 23 Aug 2010
Type Conference
Year 2006
Where GECCO
Authors Robert M. Patton, Thomas E. Potok
Comments (0)