Multilingual Relevant Sentence Detection Using Reference Corpus

15 years 12 months ago

Download nlg.csie.ntu.edu.tw

IR with reference corpus is one approach when dealing with relevant sentences detection, which takes the result of IR as the representation of query (sentence). Lack of information and language difference are two major issues in relevant detection among multilingual sentences. This paper refers to a parallel corpus for information expansion and translation, and introduces different representations, i.e. sentence-vector, document-vector and term-vector. Both sentence-aligned and document-aligned corpora, i.e., Sinorama corpus and HKSAR corpus, are used. The factors of aligning granularity, the corpus domain, the corpus size, the language basis, and the term selection strategy are addressed. The experiment results show that MRR 0.839 is achieved for similarity computation between multilingual sentences when larger finer grain parallel corpus of the same domain as test data is adopted. Generally speaking, the sentence-vector approach is superior to the term-vector approach when sentence-...

Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen

Real-time Traffic

AIRS 2004 | Information Retrieval | Parallel Corpus | Reference Corpus | Reference Corpus Approach |

claim paper

» Semantic Role Tagging for Chinese at the Lexical Level

» Construction of a Benchmark Data Set for Crosslingual Word Sense Disambiguation

» On Crosslingual Plagiarism Analysis using a Statistical Model

Post Info
More Details (n/a)

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	AIRS
Authors	Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen

Comments (0)

Sciweavers

Multilingual Relevant Sentence Detection Using Reference Corpus

AIRS 2004 | Information Retrieval | Parallel Corpus | Reference Corpus | Reference Corpus Approach |

Explore & Download

Productivity Tools

Sciweavers