Sciweavers

EMNLP
2008

N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation

14 years 29 days ago
N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation
In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the ngrams from such corpora may not be of equal relevance to the target domain, we propose an n-gram weighting technique to adjust the component n-gram probabilities based on features derived from readily available segmentation and metadata information for each corpus. Using a log-linear combination of such features, the resulting model achieves up to a
Bo-June Paul Hsu, James R. Glass
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where EMNLP
Authors Bo-June Paul Hsu, James R. Glass
Comments (0)