Most current machine transliteration systems employ a corpus of known sourcetarget word pairs to train their system, and typically evaluate their systems on a similar corpus. In t...
Written documents created through dictation differ significantly from a true verbatim transcript of the recorded speech. This poses an obstacle in automatic dictation systems as s...
Maximilian Bisani, Paul Vozila, Olivier Divay, Jef...
Roget's Thesaurus has gone through many revisions since it was first published 150 years ago. But how do these revisions affect Roget's usefulness for NLP? We examine th...
In this paper we provide a formalization of a set of default rules that we claim are required for the transfer of information such as causation, event rate and duration in the int...
Rodrigo Agerri, John A. Barnden, Mark G. Lee, Alan...
This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali...
This paper presents a comparative evaluation of several state-of-the-art English parsers based on different frameworks. Our approach is to measure the impact of each parser when i...
This paper presents the results of experiments in which we tested different kinds of features for retrieval of Chinese opinionated texts. We assume that the task of retrieval of o...
Conventional n-best reranking techniques often suffer from the limited scope of the nbest list, which rules out many potentially good alternatives. We instead propose forest reran...
Recent studies suggest that machine learning can be applied to develop good automatic evaluation metrics for machine translated sentences. This paper further analyzes aspects of l...