As the number of components in XML documents is much larger than that of ‘flat’ documents, we believe it is essential to provide users of XML information retrieval systems wi...
Structured documents, especially the XML documents, are made up of a few logical components, such as title, sections, subsections and paragraphs. The components in each structured...
The world wide web is a natural setting for cross-lingual information retrieval. The European Union is a typical example of a multilingual scenario, where multiple users have to de...
In some information retrieval scenarios, for example internal help desk systems, texts are entered into the document collection without proofreading. This can result in a relative...
This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual...