Abstract. We propose a lexicalized syntactic reordering framework for crosslanguage word aligning and translating researches. In this framework, we first flatten hierarchical source-language parse trees into syntactically-motivated linear string representations, which can easily be input to many feature-like probabilistic models. During model training, these string representations accompanied with target-language word alignment information are leveraged to learn systematic similarities and differences in languages’ grammars. At runtime, syntactic constituents of source-language parse trees will be reordered according to automatically acquired lexicalized reordering rules in previous step, to closer match word orientations of the target language. Empirical results show that, as a preprocessing component, bilingual word aligning and translating tasks benefit from our reordering methodology.
Chung-Chi Huang, Wei-Teh Chen, Jason S. Chang