Numerous approaches, including textual, structural and featural, to detecting duplicate documents have been investigated. Considering document images are usually stored and transm...
Time of creation is one of the predominant (often implicit) clustering strategies found not only in Data Warehouse systems: line items are created together with their correspondin...
The composition of services that are indexed in a large-scale service directory often involves many complex queries issued by the service composition algorithm to the directory. Th...
Information Dissemination applications are gaining increasing popularity due to dramatic improvements in communications bandwidth and ubiquity. The sheer volume of data available ...
In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of ...