

XML-aided phrase indexing for hypertext documents

14 years 10 days ago
XML-aided phrase indexing for hypertext documents
We combine techniques of XML Mining and Text Mining for the benefit of Information Retrieval. By manipulating the word sequence according to the XML structure of the marked-up text, we strengthen phrase boundaries so that they are more obvious to the algorithms that extract multiword sequences from text. Consequently, the quality of the indexed phrases improves, which has a positive effect on the average precision measured by the INEX 2007 standards. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing--Indexing methods General Terms Algorithms Keywords XML, Phrase, Word sequence, Text mining, XML Retrieval
Miro Lehtonen, Antoine Doucet
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Authors Miro Lehtonen, Antoine Doucet
Comments (0)