Web search engines discover indexable documents by recursively ‘crawling’ from a seed URL. Their rankings take into account link popularity. While this works well, it introduc...
Tom Rowlands, David Hawking, Ramesh Sankaranarayan...
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
We present a Semantic Web application that we call CS AKTive Space1 . The application exploits a wide range of semantically heterogeneous and distributed content relating to Compu...
Monica M. C. Schraefel, Nigel R. Shadbolt, Nichola...
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
Ontologies have been proven invaluable tools both for the semantic web and for personal information management. In the context of a historical archive an ontology may provide mean...
Elena Torou, Akrivi Katifori, Costas Vassilakis, G...