
13 years 4 months ago
Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets
An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine translation (MT). An English-to-English synchronous context-free gra...
Zhifei Li, Ziyuan Wang, Sanjeev Khudanpur, Jason E...
13 years 4 months ago
A framework for representing lexical resources
Our goal is to propose a description model for the lexicon. We describe a software framework for representing the lexicon and its variations called Proteus. Various examples show ...
Fabrice Issac
13 years 4 months ago
Extraction of Multi-word Expressions from Small Parallel Corpora
We present a general methodology for extracting multi-word expressions (of various types), along with their translations, from small parallel corpora. We automatically align the p...
Yulia Tsvetkov, Shuly Wintner
13 years 4 months ago
Towards the Adequate Evaluation of Morphosyntactic Taggers
There exists a well-established and almost unanimously adopted measure of tagger performance, namely, accuracy. Although it is perfectly adequate for small tagsets and typical app...
Szymon Acedanski, Adam Przepiórkowski
13 years 4 months ago
A Review Selection Approach for Accurate Feature Rating Estimation
In this paper, we propose a review selection approach towards accurate estimation of feature ratings for services on participatory websites where users write textual reviews for t...
Chong Long, Jie Zhang, Xiaoyan Zhu
13 years 4 months ago
Discriminant Ranking for Efficient Treebanking
Treebank annotation is a labor-intensive and time-consuming task. In this paper, we show that a simple statistical ranking model can significantly improve treebanking efficiency b...
Yi Zhang 0003, Valia Kordoni
13 years 4 months ago
Shallow Information Extraction from Medical Forum Data
We study a novel shallow information extraction problem that involves extracting sentences of a given set of topic categories from medical forum data. Given a corpus of medical fo...
Parikshit Sondhi, Manish Gupta, ChengXiang Zhai, J...
13 years 4 months ago
Building Systematic Reviews Using Automatic Text Classification Techniques
The amount of information in medical publications continues to increase at a tremendous rate. Systematic reviews help to process this growing body of information. They are fundame...
Oana Frunza, Diana Inkpen, Stan Matwin
13 years 4 months ago
A Global Relaxation Labeling Approach to Coreference Resolution
This paper describes the participation of RelaxCor in the Semeval-2010 task number 1: "Coreference Resolution in Multiple Languages". RelaxCor is a constraint-based grap...
Emili Sapena, Lluís Padró, Jordi Tur...