Abstract. The identification of reliable and interesting items on Internet becomes more and more difficult and time consuming. This paper is a position paper describing our intend...
The cultural heritage domain dealing with digital surrogates of rare and fragile historic artifacts is one of the most promising areas for establishing collaboratories, i.e. shared...
— One of the critical issues in search engines is the size of search indexes: as the number of documents handled by an engine increases, the search must preserve its efficiency,...
Table of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we h...
Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands...