Modeling Latent Biographic Attributes in Conversational Genres

15 years 4 months ago

Download www.aclweb.org

This paper presents and evaluates several original techniques for the latent classification of biographic attributes such as gender, age and native language, in diverse genres (conversation transcripts, email) and languages (Arabic, English). First, we present a novel partner-sensitive model for extracting biographic attributes in conversations, given the differences in lexical usage and discourse style such as observed between same-gender and mixedgender conversations. Then, we explore a rich variety of novel sociolinguistic and discourse-based features, including mean utterance length, passive/active usage, percentage domination of the conversation, speaking rate and filler word usage. Cumulatively up to 20% error reduction is achieved relative to the standard Boulis and Ostendorf (2005) algorithm for classifying individual conversations on Switchboard, and accuracy for gender detection on the Switchboard corpus (aggregate) and Gulf Arabic corpus exceeds 95%.

Nikesh Garera, David Yarowsky

Real-time Traffic

ACL 2009 | Biographic Attributes | Computational Linguistics | Conversations | Filler Word Usage |

claim paper

Added	16 Feb 2011
Updated	16 Feb 2011
Type	Journal
Year	2009
Where	ACL
Authors	Nikesh Garera, David Yarowsky

Sciweavers

Modeling Latent Biographic Attributes in Conversational Genres

ACL 2009 | Biographic Attributes | Computational Linguistics | Conversations | Filler Word Usage |

Explore & Download

Productivity Tools

Sciweavers