Conditional random fields (CRFs) have been quite successful in various machine learning tasks. However, as larger and larger data become acceptable for the current computational ma...
Abstract. For casual web users, a natural language is more accessible than formal query languages. However, understanding of a natural language query is not trivial for computer sy...
Tae-Gil Noh, Yong-Jin Han, Seong-Bae Park, Se-Youn...
Abstract. This paper describes a methodology for constructing aligned German-Chinese corpora from movie subtitles. The corpora will be used to train a special machine translation s...
This paper attacks a Japanese syllable-substitution cipher. We use a probabilistic, noisy-channel framework, exploiting various Japanese language models to drive the decipherment. ...
Abstract. Telugu is the third most spoken language in India and one of the fifteen most spoken languages in the world. But, there is no standardized input method for Telugu, which ...
Abstract. In a spoken dialogue system, the speech recognition performance accounts for the largest part of the overall system performance. Yet spontaneous speech recognition has an...
It is very significant in the knowledge society to accumulate spoken documents on the web. However, because of the high redundancy of spontaneous speech, the transcribed text in i...
Abstract. We propose a lexicalized syntactic reordering framework for crosslanguage word aligning and translating researches. In this framework, we first flatten hierarchical sourc...