: Since its creation in 1990, World Wide Web has increased the popularity of Internet which becomes an important source of information or services for all people over the world. Th...
Abstract. Crawling the deep web often requires the selection of an appropriate set of queries so that they can cover most of the documents in the data source with low cost. This ca...
Abstract. Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed cra...
There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually called hidden web data. To be able t...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...