Sciweavers

LREC
2008

Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus

14 years 28 days ago
Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus
This paper describes a method of readability measurement of Japanese texts based on a newly compiled textbook corpus. The textbook corpus consists of 1,478 sample passages extracted from 127 textbooks of elementary school, junior high school, high school, and university; it is divided into thirteen grade levels and the total size is about a million characters. For a given text passage, the readability measurement method determines the grade level to which the passage is the most similar by using character-unigram models, which are constructed from the textbook corpus. Because this method does not require sentence-boundary analysis and word-boundary analysis, it is applicable to texts that include incomplete sentences and non-regular text fragments. The performance of this method, which is measured by the correlation coefficient, is considerably high (R > 0.9); in case that the length of a text passage is limited in 25 characters, the correlation coefficient is still high (R = 0.83)...
Satoshi Sato, Suguru Matsuyoshi, Yohsuke Kondoh
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Satoshi Sato, Suguru Matsuyoshi, Yohsuke Kondoh
Comments (0)