Sciweavers

AIRS
2004
Springer

Combining Sentence Length with Location Information to Align Monolingual Parallel Texts

14 years 4 months ago
Combining Sentence Length with Location Information to Align Monolingual Parallel Texts
Abundant Chinese paraphrasing resource on Internet can be attained from different Chinese translations of one foreign masterpiece. Paraphrases corpus is the corpus that includes sentence pairs to convey the same information. The irregular characteristics of the real monolingual parallel texts, especially without the strictly aligned paragraph boundaries between two translations, bring a challenge to alignment technology. The traditional alignment methods on bilingual texts have some difficulties in competency for doing this. A new method for aligning real monolingual parallel texts using sentence pair's length and location information is described in this paper. The model was motivated by the observation that the location of a sentence pair with certain length is distributed in the whole text similarly. And presently, a paraphrases corpus with about fifty thousand sentence pairs is constructed. Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natural Language P...
Weigang Li, Ting Liu, Sheng Li
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where AIRS
Authors Weigang Li, Ting Liu, Sheng Li
Comments (0)