Discovering Sociolinguistic Associations with Structured Sparsity

14 years 11 months ago

Download people.csail.mit.edu

We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite 1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefﬁcients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identiﬁes a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identiﬁes a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties.

Jacob Eisenstein, Noah A. Smith, Eric P. Xing

Real-time Traffic

ACL 2011 | Computational Linguistics | Geographic Communities | Linguistic Properties | Regression Problem |

claim paper

Post Info
More Details (n/a)

Added	23 Aug 2011
Updated	23 Aug 2011
Type	Journal
Year	2011
Where	ACL
Authors	Jacob Eisenstein, Noah A. Smith, Eric P. Xing

Comments (0)

Sciweavers

Discovering Sociolinguistic Associations with Structured Sparsity

ACL 2011 | Computational Linguistics | Geographic Communities | Linguistic Properties | Regression Problem |

Explore & Download

Productivity Tools

Sciweavers