Latent Morpho-Semantic Analysis: Multilingual Information Retrieval with Character N-Grams and Mutual Information

15 years 8 months ago

Download www.aclweb.org

We describe an entirely statistics-based, unsupervised, and languageindependent approach to multilingual information retrieval, which we call Latent Morpho-Semantic Analysis (LMSA). LMSA overcomes some of the shortcomings of related previous approaches such as Latent Semantic Analysis (LSA). LMSA has an important theoretical advantage over LSA: it combines well-known techniques in a novel way to break the terms of LSA down into units which correspond more closely to morphemes. Thus, it has a particular appeal for use with morphologically complex languages such as Arabic. We show through empirical results that the theoretical advantages of LMSA can translate into significant gains in precision in multilingual information retrieval tests. These gains are not matched either when a standard stemmer is used with LSA, or when terms are indiscriminately broken down into n-grams.

Peter A. Chew, Brett W. Bader, Ahmed Abdelali

Real-time Traffic

COLING 2008 | Computational Linguistics | Latent Semantic Analysis | Multilingual Information Retrieval | Theoretical Advantages |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	COLING
Authors	Peter A. Chew, Brett W. Bader, Ahmed Abdelali

Comments (0)

Sciweavers

Latent Morpho-Semantic Analysis: Multilingual Information Retrieval with Character N-Grams and Mutual Information

COLING 2008 | Computational Linguistics | Latent Semantic Analysis | Multilingual Information Retrieval | Theoretical Advantages |

Explore & Download

Productivity Tools

Sciweavers