This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual...
We cope with the metadata recognition in layoutoriented documents. We address the problem as a classification task and propose a method for automatic extraction of relevant featu...
— The number of XML documents produced and available on the Internet is steadily increasing. It is thus important to devise automatic procedures to extract useful information fro...
Francesca Trentini, Markus Hagenbuchner, Alessandr...
The rapid growth of XML adoption has urged for the need of a proper representation for semi-structured documents, where the document structural information has to be taken into ac...
Abstract. XML query processors suffer from main-memory limitations that prevent them from processing large XML documents. While content-based predicates can be used to project down...