Latent semantic indexing (LSI) fails for TREC collections

13 years 7 months ago

Download www.sigkdd.org

The aim of latent semantic indexing (LSI) is to uncover the relationships between terms, hidden concepts, and documents. LSI uses the matrix factorization technique known as singular value decomposition (SVD). In this paper, we apply LSI to standard benchmark collections. We find that LSI yields poor retrieval accuracy on the TREC 2, 7, 8, and 2004 collections. We believe that the negative result is robust, because we try more LSI variants than any previous work. First, we show that using Okapi BM25 weights for terms in documents improves the performance of LSI. Second, we derive novel scoring methods that implement the ideas of query expansion and score regularization in the LSI framework. Third, we show how to combine the BM25 method with LSI methods. All proposed methods are evaluated experimentally on the four TREC collections mentioned above. The experiments show that the new variants of LSI improve upon previous LSI methods. Nevertheless, no way of using LSI achieves a worthwhil...

Avinash Atreya, Charles Elkan

Real-time Traffic

Information Technology | LSI Framework | LSI Methods | Retrieval Accuracy | SIGKDD 2010 |

claim paper

Post Info
More Details (n/a)

Added	21 May 2011
Updated	21 May 2011
Type	Journal
Year	2010
Where	SIGKDD
Authors	Avinash Atreya, Charles Elkan

Comments (0)

Sciweavers

Latent semantic indexing (LSI) fails for TREC collections

Information Technology | LSI Framework | LSI Methods | Retrieval Accuracy | SIGKDD 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers