We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach i...
English noun/verb (N/V) pairs (contract, cement) have undergone complex patterns of change between 3 stress patterns for several centuries. We describe a longitudinal dataset of N...
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level genera...
Jacob Eisenstein, Brendan O'Connor, Noah A. Smith,...
In this paper, we present an unsupervised hybrid model which combines statistical, lexical, linguistic, contextual, and temporal features in a generic EMbased framework to harvest...
In this paper we propose a method for the automatic decipherment of lost languages. Given a non-parallel corpus in a known related language, our model produces both alphabetic map...