Today, Web pages are usually accessed using text search engines, whereas documents stored in the deep Web are accessed through domain-specific Web portals. These portals rely on e...
—Probabilistic topic models were originally developed and utilised for document modeling and topic extraction in Information Retrieval. In this paper we describe a new approach f...
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
With the spread of digital cameras, shooting photos has been becoming an everyday affair. However, there are few methods or systems to manage photos simply, and a huge amount of ...
The need for descriptive information, i.e., metadata, about Web resources has been recognized in several application contexts (e.g., digital libraries, portals). The Resource Descr...
Sofia Alexaki, Vassilis Christophides, Gregory Kar...