Topic-based text summaries promise to help average users quickly understand a text collection and derive insights. Recent research has shown that the Latent Dirichlet Allocation (LDA) model is one of the most effective approaches to topic analysis. However, the LDA-based results may not be ideal for human understanding and consumption. In this paper, we present several topic and keyword re-ranking approaches that can help users better understand and consume the LDA-derived topics in their text analysis. Our methods process the LDA output based on a set of criteria that model a user’s information needs. Our evaluation demonstrates the usefulness of the methods in summarizing several large-scale, real world data sets. Categories and Subject Descriptors: I.2.6 [Artificial Intelligence]: Learning H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing General Terms: Algorithms, Experimentation
Yangqiu Song, Shimei Pan, Shixia Liu, Michelle X.