Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus

15 years 8 months ago

Download www.lrec-conf.org

This paper describes a method of readability measurement of Japanese texts based on a newly compiled textbook corpus. The textbook corpus consists of 1,478 sample passages extracted from 127 textbooks of elementary school, junior high school, high school, and university; it is divided into thirteen grade levels and the total size is about a million characters. For a given text passage, the readability measurement method determines the grade level to which the passage is the most similar by using character-unigram models, which are constructed from the textbook corpus. Because this method does not require sentence-boundary analysis and word-boundary analysis, it is applicable to texts that include incomplete sentences and non-regular text fragments. The performance of this method, which is measured by the correlation coefficient, is considerably high (R > 0.9); in case that the length of a text passage is limited in 25 characters, the correlation coefficient is still high (R = 0.83)...

Satoshi Sato, Suguru Matsuyoshi, Yohsuke Kondoh

Real-time Traffic

Education | LREC 2008 | Readability Measurement | Text Passage | Textbook Corpus |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Satoshi Sato, Suguru Matsuyoshi, Yohsuke Kondoh

Comments (0)

Sciweavers

Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus

Education | LREC 2008 | Readability Measurement | Text Passage | Textbook Corpus |

Explore & Download

Productivity Tools

Sciweavers