The Linguistic Data Consortium (LDC) is currently involved in a major effort to expand its multilingual text resources, in particular for machine translation, message understandin...
The Arabic Treebank (ATB), released by the Linguistic Data Consortium, contains multiple annotation files for each source file, due in part to the role of diacritic inclusion in t...
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
This paper describes a process of building a bilingual syntactically annotated corpus, the PCEDT (Prague Czech-English Dependency Treebank). The corpus is being created at Charles...