Text classification remains one of the major fields of research in natural language processing. This paper evaluates the use of the computational tool Coh-Metrix as a means to distinguish between seemingly similar text-types. Using a discriminant analysis on a corpus of second language reading texts, this paper demonstrates that CohMetrix is able to significantly distinguish authentic texttypes from ones that have been specifically simplified for second language readers. This paper offers important findings for text classification research and for second language reading materials developers and second language teachers by demonstrating that moderate, shallow, textual changes can affect discourse structures.
Scott A. Crossley, Philip M. McCarthy, Danielle S.