Sciweavers

ICADL
2010
Springer

Thesaurus Extension Using Web Search Engines

14 years 5 months ago
Thesaurus Extension Using Web Search Engines
Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending existing methods from the areas of thesaurus maintenance, natural language processing, and machine learning to (a) extract a set of novel candidate concepts from text corpora and (b) to generate a small ranked list of suggestions for the position of these concept in an existing thesaurus. Based on a modification of the standard tf-idf term weighting we extract relevant concept candidates from a document corpus. We then apply a pattern-based machine learning approach on content extracted from web search engine snippets to determine the type of relation between the candidate terms and existing thesaurus concepts. The approach is evaluated with a largescale experiment using the MeSH and WordNet thesauri as testbed.
Robert Meusel, Mathias Niepert, Kai Eckert, Heiner
Added 19 Jul 2010
Updated 19 Jul 2010
Type Conference
Year 2010
Where ICADL
Authors Robert Meusel, Mathias Niepert, Kai Eckert, Heiner Stuckenschmidt
Comments (0)