Sciweavers

ICTAI
1999
IEEE

A New Study on Using HTML Structures to Improve Retrieval

14 years 4 months ago
A New Study on Using HTML Structures to Improve Retrieval
Locating useful information effectively from the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks of HTML documents to improve the effectiveness of retrieving HTML documents. This methodology partitions the occurrences of terms in a document collection into classes according to the tags in which a particular term appears (such as Title, H1H6, and Anchor). The rationale is that terms appearing in different structures of a document may have different significance in identifying the document. The weighting schemes of traditional information retrieval were extended to include class importance values. We implemented a genetic algorithm to determine a "best so far" class importance factor combination. Our experiments indicate that using this technique the retrieval effectiveness can be improved by 39.6% or higher.
Michal Cutler, H. Deng, S. Maniccam, Weiyi Meng
Added 03 Aug 2010
Updated 03 Aug 2010
Type Conference
Year 1999
Where ICTAI
Authors Michal Cutler, H. Deng, S. Maniccam, Weiyi Meng
Comments (0)