Searching for non-text data (e.g., images) is mostly done by means of metadata annotations or by extracting the text close to the data. However, supporting real content-based audi...
News articles about the same event published over time have properties that challenge NLP and IR applications. A cluster of such texts typically exhibits instances of paraphrase a...
Cross-language document retrieval systems require support by some kind of multilingual thesaurus for semantically indexing documents in different languages. The peculiarities of t...
This paper addresses conceptual modeling and automatic code generation for search engine integration with data intensive Web applications. We have analyzed the similarities (and di...
Alessandro Bozzon, Tereza Iofciu, Wolfgang Nejdl, ...
The problem of document classification considers categorizing or grouping of various document types. Each document can be represented as a bag of words, which has no straightforw...