Sciweavers

EMNLP
2008
13 years 9 months ago
Triplet Lexicon Models for Statistical Machine Translation
This paper describes a lexical trigger model for statistical machine translation. We present various methods using triplets incorporating long-distance dependencies that can go be...
Sasa Hasan, Juri Ganitkevitch, Hermann Ney, Jes&ua...
EMNLP
2008
13 years 9 months ago
Lattice-based Minimum Error Rate Training for Statistical Machine Translation
Minimum Error Rate Training (MERT) is an effective means to estimate the feature function weights of a linear model such that an automated evaluation criterion for measuring syste...
Wolfgang Macherey, Franz Josef Och, Ignacio Thayer...
EMNLP
2008
13 years 9 months ago
Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce
This paper explores the challenge of scaling up language processing algorithms to increasingly large datasets. While cluster computing has been available in commercial environment...
Jimmy J. Lin
EMNLP
2008
13 years 9 months ago
Learning with Probabilistic Features for Improved Pipeline Models
We present a novel learning framework for pipeline models aimed at improving the communication between consecutive stages in a pipeline. Our method exploits the confidence scores ...
Razvan C. Bunescu
EMNLP
2008
13 years 9 months ago
Studying the History of Ideas Using Topic Models
How can the development of ideas in a scientific field be studied over time? We apply unsupervised topic modeling to the ACL Anthology to analyze historical trends in the field of...
David Hall, Daniel Jurafsky, Christopher D. Mannin...
EMNLP
2008
13 years 9 months ago
Language and Translation Model Adaptation using Comparable Corpora
Traditionally, statistical machine translation systems have relied on parallel bi-lingual data to train a translation model. While bi-lingual parallel data are expensive to genera...
Matthew G. Snover, Bonnie J. Dorr, Richard M. Schw...
EMNLP
2008
13 years 9 months ago
An Analysis of Active Learning Strategies for Sequence Labeling Tasks
Active learning is well-suited to many problems in natural language processing, where unlabeled data may be abundant but annotation is slow and expensive. This paper aims to shed ...
Burr Settles, Mark Craven
EMNLP
2008
13 years 9 months ago
Seed and Grow: Augmenting Statistically Generated Summary Sentences using Schematic Word Patterns
We examine the problem of content selection in statistical novel sentence generation. Our approach models the processes performed by professional editors when incorporating materi...
Stephen Wan, Robert Dale, Mark Dras, Cécile...
EMNLP
2008
13 years 9 months ago
Specialized Models and Ranking for Coreference Resolution
This paper investigates two strategies for improving coreference resolution: (1) training separate models that specialize in particular types of mentions (e.g., pronouns versus pr...
Pascal Denis, Jason Baldridge