Word Clustering Using PLSA Enhanced with Long Distance Bigrams

15 years 4 months ago

Download www.icpr2010.org

Probabilistic latent semantic analysis is enhanced with long distance bigram models in order to improve word clustering. The long distance bigram probabilities and the interpolated long distance bigram probabilities at varying distances within a context capture different aspects of contextual information. In addition, the baseline bigram, which incorporates trigger-pairs for various histories, is tested in the same framework. The experimental results collected on publicly available corpora (CISI, Cranfield, Medline, and NPL) demonstrate the superiority of the long distance bigrams over the baseline bigrams as well as the superiority of the interpolated long distance bigrams against the long distance bigrams and the baseline bigram with trigger-pairs in yielding more compact clusters containing less outliers.

Nikoletta Bassiou, Constantine Kotropoulos

Real-time Traffic

Computer Vision | Distance Bigram Probabilities | ICPR 2010 | Interpolated Long Distance | Long Distance Bigrams |

claim paper

Post Info
More Details (n/a)

Added	12 Feb 2011
Updated	12 Feb 2011
Type	Journal
Year	2010
Where	ICPR
Authors	Nikoletta Bassiou, Constantine Kotropoulos

Comments (0)

Sciweavers

Word Clustering Using PLSA Enhanced with Long Distance Bigrams

Computer Vision | Distance Bigram Probabilities | ICPR 2010 | Interpolated Long Distance | Long Distance Bigrams |

Explore & Download

Productivity Tools

Sciweavers