A study of statistical models for query translation: finding a good unit of translation

15 years 9 months ago

Download research.microsoft.com

This paper presents a study of three statistical query translation models that use different units of translation. We begin with a review of a word-based translation model that uses cooccurrence statistics for resolving translation ambiguities. The translation selection problem is then formulated under the framework of graphic model resorting to which the modeling assumptions and limitations of the co-occurrence model are discussed, and the research of finding better translation units is motivated. Then, two other models that use larger, linguistically motivated translation units (i.e., noun phrase and dependency triple) are presented. For each model, the modeling and training methods are described in detail. All query translation models are evaluated using TREC collections. Results show that larger translation units lead to more specific models that usually achieve better translation and cross-language information retrieval results. Categories and Subject Descriptors H.3.3 [Informati...

Jianfeng Gao, Jian-Yun Nie

Real-time Traffic