

Adapting LSI for Fine-Grained and Multi-Level Document Comparison

14 years 3 months ago
Adapting LSI for Fine-Grained and Multi-Level Document Comparison
In recent years, Latent Semantic Indexing (LSI) has been recognized as an effective tool for Information Retrieval in text documents. The level of "granularity" in LSI (i.e. whether LSI is performed on documents, paragraphs, sentences, phrases, etc.) is somewhat of a limiting factor, in that LSI comparisons can only be made at the level of granularity chosen. Here we argue that, as long as a record of the document structure is maintained, the level of granularity may be arbitrarily fine while still allowing for comparison at any coarser granularity. It is shown that the reduced-dimension vector for any particular section of a document is a function of the vectors of its constituent subsections. Using this information, we illustrate how LSI can be used to compare documents at multiple structural levels. One possible application (automated plagiarism detection) is discussed as an example of how this method of multilevel comparison may be used to improve query time in fine-gran...
Nicholas Adelman, Marin Simina
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Authors Nicholas Adelman, Marin Simina
Comments (0)