Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Similarity measures are mechanisms that assign a numeric score indicating how closely two documents, or a document and a query match. The Cosine measure is one of the similarity m...
Abstract – The method of latent semantic indexing (LSI) is well known for tackling the synonymy and polysemy problems in information retrieval. However, its performance can be ve...
Abstract – The method of latent semantic indexing (LSI) is well known for tackling the synonymy and polysemy problems in information retrieval. However, its performance can be ve...
A Question Answering (QA) system aims to return exact answers to natural language questions. While today information retrieval techniques are quite successful at locating within l...