The language MIX consists of all strings over the three-letter alphabet {a, b, c} that contain an equal number of occurrences of each letter. We prove Joshi’s (1985) conjecture ...
In this paper we introduce the novel task of “word epoch disambiguation,” defined as the problem of identifying changes in word usage over time. Through experiments run using...
We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides cons...
Shay B. Cohen, Karl Stratos, Michael Collins, Dean...
This paper describes Movie-DiC a Movie Dialogue Corpus recently collected for research and development purposes. The collected dataset comprises 132,229 dialogues containing a tot...
The question of how meaning might be acquired by young children and represented by adult speakers of a language is one of the most debated topics in cognitive science. Existing se...