Morphological Analysis of a Large Spontaneous Speech Corpus in Japanese

15 years 9 months ago

Download nlp.cs.nyu.edu

This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The ﬁrst method is used to detect any type of word segments. The second method is used when there are several deﬁnitions for word segments and their POS categories, and when one type of word segments includes another type of word segments. In this paper, we show that by using semiautomatic analysis we achieve a precision of better than 99% for detecting and tagging short words and 97% for long words; the two types of words that comprise the corpus. We also show that better accuracy is achieved by using both methods than by using only the ﬁrst.

Kiyotaka Uchimoto, Chikashi Nobata, Atsushi Yamada

Real-time Traffic

ACL 2003 | ACL 2007 | Japanese Spontaneous Speech | Spontaneous Speech Corpus | Word Segments |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	ACL
Authors	Kiyotaka Uchimoto, Chikashi Nobata, Atsushi Yamada, Satoshi Sekine, Hitoshi Isahara

Comments (0)

Sciweavers

Morphological Analysis of a Large Spontaneous Speech Corpus in Japanese

ACL 2003 | ACL 2007 | Japanese Spontaneous Speech | Spontaneous Speech Corpus | Word Segments |

Explore & Download

Productivity Tools

Sciweavers