One of the key tasks for analyzing conversational data is segmenting it into coherent topic segments. However, most models of topic segmentation ignore the social aspect of conversations, focusing only on the words used. We introduce a hierarchical Bayesian nonparametric model, Speaker Identity for Topic Segmentation (SITS), that discovers (1) the topics used in a conversation, (2) how these topics are shared across conversations, (3) when these topics shift, and (4) a person-specific tendency to introduce new topics. We evaluate against current unsupervised segmentation models to show that including personspecific information improves segmentation performance on meeting corpora and on political debates. Moreover, we provide evidence that SITS captures an individual’s tendency to introduce new topics in political contexts, via analysis of the 2008 US presidential debates and the television program Crossfire. 1 Topic Segmentation as a Social Process Conversation, interactive discu...
Viet-An Nguyen, Jordan L. Boyd-Graber, Philip Resn