

Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases.

15 years 15 days ago
Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases.
As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we propose a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and store probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose a heuristic method to speed up the iterative EM algorithm ...
ChengXiang Zhai, Duo Zhang, Jiawei Han
Added 07 Mar 2010
Updated 07 Mar 2010
Type Conference
Year 2009
Where SDM
Authors ChengXiang Zhai, Duo Zhang, Jiawei Han
Comments (0)