Sciweavers

AIRS
2004
Springer

Multilingual Relevant Sentence Detection Using Reference Corpus

14 years 5 months ago
Multilingual Relevant Sentence Detection Using Reference Corpus
IR with reference corpus is one approach when dealing with relevant sentences detection, which takes the result of IR as the representation of query (sentence). Lack of information and language difference are two major issues in relevant detection among multilingual sentences. This paper refers to a parallel corpus for information expansion and translation, and introduces different representations, i.e. sentence-vector, document-vector and term-vector. Both sentence-aligned and document-aligned corpora, i.e., Sinorama corpus and HKSAR corpus, are used. The factors of aligning granularity, the corpus domain, the corpus size, the language basis, and the term selection strategy are addressed. The experiment results show that MRR 0.839 is achieved for similarity computation between multilingual sentences when larger finer grain parallel corpus of the same domain as test data is adopted. Generally speaking, the sentence-vector approach is superior to the term-vector approach when sentence-...
Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen
Added 30 Jun 2010
Updated 30 Jun 2010
Type Conference
Year 2004
Where AIRS
Authors Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen
Comments (0)