This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
Nowadays, Web encyclopedias suffer from a high bounce rate. Typically, users come to an encyclopaedia from a search engine and upon reading the first page on the site they leave it...
The traditional crawlers used by search engines to build their collection of Web pages frequently gather unmodified pages that already exist in their collection. This creates unne...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Background: The web has seen an explosion of chemistry and biology related resources in the last 15 years: thousands of scientific journals, databases, wikis, blogs and resources ...
Egon L. Willighagen, Noel M. O'Boyle, Harini Gopal...