Today the availability of large digital content archives (video, ebook, audio) creates many problems in terms of user interaction and data manipulation (browsing, searching). Many...
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the p...
Research environments have been changing with the availability of new technologies for decades. Researchers are benefited from digital libraries, online databases, and Web search...
This paper presents an architectural design and evaluation result of an efficient Web-crawling system. The design involves a fully distributed architecture, a URL allocating algor...