Starting from first principles, we re-visit the statistical approach and study two forms of the Bayes decision rule: the common rule for minimizing the number of string errors and...
State-of-the-art machine translation techniques are still far from producing high quality translations. This drawback leads us to introduce an alternative approach to the translat...
Jorge Civera, Elsa Cubel, Antonio L. Lagarda, Davi...
This paper proposes a novel method to compile statistical models for machine translation to achieve efficient decoding. In our method, each statistical submodel is represented by ...
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...
This paper presents Japanese morphological analysis based on conditional random fields (CRFs). Previous work in CRFs assumed that observation sequence (word) boundaries were fixed...
Anticipating the availability of large questionanswer datasets, we propose a principled, datadriven Instance-Based approach to Question Answering. Most question answering systems ...
We compare and contrast the strengths and weaknesses of a syntax-based machine translation model with a phrase-based machine translation model on several levels. We briefly descr...
Steve DeNeefe, Kevin Knight, Wei Wang 0006, Daniel...
We address the problem of smoothing translation probabilities in a bilingual N-grambased statistical machine translation system. It is proposed to project the bilingual tuples ont...
We explore the use of Wikipedia as external knowledge to improve named entity recognition (NER). Our method retrieves the corresponding Wikipedia entry for each candidate word seq...
This paper discusses automatic determination of case in Arabic. This task is an important part and major source of errors in full diacritization of Arabic. We use a goldstandard s...
Nizar Habash, Ryan Gabbard, Owen Rambow, Seth Kuli...