highly abstracted. The Chinese writing system uses logographs--conventional representations of words or morphemes. Characters of the most common kind have two parts, one suggesting...
Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
Real-world natural language sentences are long and complex, and always contain unexpected grammatical constructions. It even includes noise and ungrammaticality. This paper descri...
Abstract--Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic knowledge caused system performance to deteriorate. The most recent su...
In this paper we explore the idea that the code that constitutes a program actually forms a higher-level, program specific language. The symbols of the language are the abstracti...