The bottleneck for dictionary-based cross-language information retrieval is the lack of comprehensive dictionaries, in particular for many different languages. We here introduce a methodology by which multilingual dictionaries (for Spanish and Swedish) emerge automatically from simple seed lexicons. These seed lexicons are automatically generated, by cognate mapping, from (previously manually constructed) Portuguese and German as well as English sources. Lexical and semantic hypotheses are then validated and new ones iteratively generated by making use of co-occurrence patterns of hypothesized translation synonyms in parallel corpora. We evaluate these newly derived dictionaries on a large medical document collection within a cross-language retrieval setting. Categories and Subject Descriptors H.3.1 [Content Analysis and Indexing]: Dictionaries, Thesauruses; H.3.3 [Information Search and Retrieval]: Retrieval models General Terms Algorithms Keywords Cross-Language Information Retrieva...
Kornél G. Markó, Stefan Schulz, Olen