Bayesian Folding-In with Dirichlet Kernels for PLSI

14 years 7 months ago

Download users.informatik.uni-halle.de

Probabilistic latent semantic indexing (PLSI) represents documents of a collection as mixture proportions of latent topics, which are learned from the collection by an expectation maximization (EM) algorithm. New documents or queries need to be folded into the latent topic space by a simpliﬁed version of the EM-algorithm. During PLSIFolding-in of a new document, the topic mixtures of the known documents are ignored. This may lead to a suboptimal model of the extended collection. Our new approach incorporates the topic mixtures of the known documents in a Bayesian way during foldingin. That knowledge is modeled as prior distribution over the topic simplex using a kernel density estimate of Dirichlet kernels. We demonstrate the advantages of the new Bayesian folding-in using real text data.

Alexander Hinneburg, Hans-Henning Gabriel, Andr&eg

Real-time Traffic

Data Mining | ICDM 2007 | Latent Semantic Indexing | Latent Topic | Topic Mixtures |

claim paper

Post Info
More Details (n/a)

Added	03 Jun 2010
Updated	03 Jun 2010
Type	Conference
Year	2007
Where	ICDM
Authors	Alexander Hinneburg, Hans-Henning Gabriel, Andrè Gohr

Comments (0)

Sciweavers

Bayesian Folding-In with Dirichlet Kernels for PLSI

Data Mining | ICDM 2007 | Latent Semantic Indexing | Latent Topic | Topic Mixtures |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers