Resource selection for domain-specific cross-lingual IR

15 years 6 months ago

Download www.cs.cmu.edu

An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several competitive CLIR methods - with different training corpora - on test documents in the medical domain. Our results show severe performance degradation when using a general-purpose training corpus or a commercial machine translation system (SYSTRAN), versus a domain-specific training corpus. A related unexplored question is whether we can improve CLIR performance by systematically analyzing training resources and optimally matching them to target collections. We start exploring this problem by suggesting a simple criterion for automatically matching training resources to target corpora. By using cosine similarity between training and target corpora as resource weights we obtained an average of 5.6% improvement over using all resources...

Monica Rogati, Yiming Yang

Real-time Traffic

Cross-language Information Retrieval | SIGIR 2004 | Target Corpora | Training |

claim paper

Post Info
More Details (n/a)

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	SIGIR
Authors	Monica Rogati, Yiming Yang

Comments (0)

Sciweavers

Resource selection for domain-specific cross-lingual IR

Cross-language Information Retrieval | SIGIR 2004 | Target Corpora | Training |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers